United States Patent6026230
Lin , ; et al.February 15, 2000

Title

Memory simulation system and method

Abstract

The SEmulation system provides four modes of operation: (1) Software Simulation, (2) Simulation via Hardware Acceleration, (3) In-Circuit Emulation (ICE), and (4) Post-Simulation Analysis. At a high level, the present invention may be embodied in each of the above four modes or various combinations of these modes. At the core of these modes is a software kernel which controls the overall operation of this system. The main control loop of the kernel executes the following steps: initialize system, evaluate active test-bench processes/components, evaluate clock components, detect clock edge, update registers and memories, propagate combinational components, advance simulation time, and continue the loop as long as active test-bench processes are present. The Memory Mapping aspect of the invention provides a structure and scheme where the numerous memory blocks associated with the user's design is mapped into the SRAM memory devices in the Simulation system instead of inside the logic devices, which are used to configure and model the user's design. The Memory Mapping or Memory Simulation system includes a memory state machine, an evaluation state machine, and their associated logic to control and interface with: (1) the main computing system and its associated memory system, (2) the SRAM memory devices coupled to the FPGA buses in the Simulation system, and (3) the FPGA logic devices which contain the configured and programmed user design that is being debugged.


Inventors:Lin; Sharon Sheau-Pyng (Cupertino, CA), Tseng; Ping-Sheng  (Sunnyvale, CA)
Assignee:Axis Systems, Inc. (Sunnyvale, CA)
Appl. No.:019328
Filed:February 5, 1998

Current U.S. Class:703/13 
Current International Class:G06F 17/50 (20060101)
Field of Search:395/500.34,500.35,500.36,500.37,500.38,500.44 711/100

U.S. Patent Documents
3106698October 1963Unger
3287702November 1966Borck, Jr. et al.
3287703November 1966Slotnick
3473160October 1969Wahlstrom
4020469April 1977Manning
4306286December 1981Cocke et al.
4386403May 1983Hsieh et al.
4488354December 1984Chan et al.
4503386March 1985DasGupta et al.
4541071September 1985Ohmori
4577276March 1986Dunlop et al.
4578761March 1986Gray
4593363June 1986Burstein et al.
4612618September 1986Pryor et al.
4621339November 1986Wagner et al.
4642487February 1987Carter
4656580April 1987Hitchcock, Sr. et al.
4656592April 1987Spaanenburg et al.
4675832June 1987Robinson et al.
4682440July 1987Hunter
4695999September 1987Lebizay
4697241September 1987Lavi
4700187October 1987Furtek
4706216November 1987Carter
4736338April 1988Saxe et al.
4740919April 1988Elmer
4744084May 1988Beck et al.
4747102May 1988Funatsu
4752887June 1988Kuwahara
4758985July 1988Carter
4768196August 1988Jou et al.
4777606October 1988Fournier
4786904November 1988Graham, III et al.
4787061November 1988Nei et al.
4791602December 1988Resnick
4803636February 1989Nishiyama et al.
4811214March 1989Nosenchuck et al.
4815003March 1989Putatunda et al.
4823276April 1989Hiwatashi
4827427May 1989Hyduke
4835705May 1989Fujino et al.
4849904July 1989Aipperspach et al.
4849928July 1989Hauck
4862347August 1989Rudy
4870302September 1989Freeman
4872125October 1989Catlin
4876466October 1989Kondou et al.
4882690November 1989Shinsha et al.
4901259February 1990Watkins
4901260February 1990Lubachevsky
4908772March 1990Chi
4914612April 1990Beece et al.
4918440April 1990Furtek
4918594April 1990Onizuka
4922432May 1990Kobayashi et al.
4924429May 1990Kurashita et al.
4931946June 1990Ravindra et al.
4935734June 1990Austin
4942536July 1990Watanabe et al.
4942615July 1990Hirose
4945503July 1990Takasaki
4949275August 1990Nonaka
4951220August 1990Ramacher et al.
4965739October 1990Ng
5003487March 1991Drumm et al.
5023775June 1991Poret
5041986August 1991Tanishita
5046017September 1991Yuyama et al.
5051938September 1991Hyduke
5053980October 1991Kanazawa
5081602January 1992Glover
5084824January 1992Lam et al.
5093920March 1992Agrawal et al.
5109353April 1992Sample et al.
5114353May 1992Sample
5126966June 1992Hafeman et al.
5128871July 1992Schmitz
5140526August 1992McDermith et al.
5146460September 1992Ackerman et al.
5189628February 1993Olsen et al.
5193068March 1993Britman
5197016March 1993Sugimoto et al.
5224056June 1993Chene et al.
5231588July 1993Agrawal et al.
5231589July 1993Itoh et al.
5233539August 1993Agrawal et al.
5253181October 1993Marui et al.
5258932November 1993Matsuzaki
5259006November 1993Price et al.
5260881November 1993Agrawal et al.
5263149November 1993Winlow
5272651December 1993Bush et al.
5329470July 1994Sample et al.
5343406August 1994Freeman et al.
5352123October 1994Sample et al.
5371390December 1994Mohsen
5377124December 1994Mohsen
5425036June 1995Liu et al.
5448496September 1995Butts et al.
5448522September 1995Huang
5452227September 1995Kelsey et al.
5452231September 1995Butts et al.
5452239September 1995Dai et al.
5467462November 1995Fujii
5475830December 1995Chen et al.
5477475December 1995Sample et al.
5504354April 1996Mohsen
5563829October 1996Huang
5612891March 1997Butts et al.
5644515July 1997Sample et al.
5649167July 1997Chen et al.
5654564August 1997Mohsen
5657241August 1997Butts et al.
5661409August 1997Mohsen
5661662August 1997Butts et al.
5796623August 1998Butts et al.
5812414September 1998Butts et al.
5822564October 1998Chilton et al.
Other References
ALTERA Application Note 59 ver 1.01 dated Aug. 1998. .
ALTERA Data Sheet, `ByteBlaster Parallel Port Download Cable`, ver 2.01 dated Feb. 1998. .
ALTERA Data Sheet, `Configuration Devices for APEX and FLEX Devices`, ver 1.0 dated May 1999. .
ALTERA Application Note 38, ver 2.01 dated May 1994. .
Xilinx Data Book, `Xilinx PCI-The Core of a Great Idea` copyright 1998..~
Primary Examiner: Stamber; Eric W.
Assistant Examiner: Knox; Lonnie A.
Attorney, Agent or Firm:Chou; Chien-Wei (Chris) Oppenheimer Wolff & Donnelly LLP Hamrick; Claude A. S.

Parent Case Text



RELATED U.S. APPLICATION

This is a continuation-in-part of U.S. patent application Ser. No. 08/850,136, which was filed with the United States Patent and Trademark Office (USPTO) on May 2, 1997.

Claims


We claim:
1. A memory mapping system for mapping at least one memory block from at least one logic device to at least one memory device in a reconfigurable hardware unit, the reconfigurable hardware unit including a bus controller, at least one logic device for modeling at least a portion of the user design in hardware where the hardware model has at least one memory block and associated user memory interface, at least one memory device, a bus subsystem coupling at least one logic device, at least one memory device, and the bus controller, the memory mapping system comprising:
a bus driver coupled to the bus subsystem;
a memory block interface coupled to the bus driver, the bus subsystem, and the user memory interface to handle write/read memory access between at least one logic device and at least one memory device, at least one memory device storing the memory blocks associated with the hardware model; and
an evaluation logic in each logic device coupled to the hardware model, the bus driver, the memory block interface, and the bus controller for providing evaluation control signals, the evaluation control signals used to evaluate data in the hardware model and to control write/read memory access between at least one logic device and at least one memory device via the bus driver and the memory block interface.

2. The system of claim 1, wherein the memory block interface further comprises:
a memory converter for interfacing with the user memory interface and converting the user memory type into the type of memory of the memory device in the reconfigurable hardware unit; and
a buffer coupled to the bus subsystem, the evaluation logic, and the user memory interface for receiving data from the bus subsystem.

3. The system of claim 2, wherein the buffer is a double buffer.

4. The system of claim 3 wherein the double buffer further comprises:
a first flip-flop having a first data input, a first data output, and a first control input, wherein the first data input is coupled to the bus subsystem for receiving data, the control input coupled to the evaluation logic for receiving evaluation control signals; and
a second flip-flop having a second data input, a second data output, and a second control input, wherein the second data input is coupled to the first data output, the second control input is coupled to the evaluation logic for receiving evaluation control signals, and the second data output is coupled to the user memory interface.

5. The system of claim 4 wherein the first flip-flop and the second flip-flop are D-type flip-flops.

6. The system of claim 5 wherein the first control input receives a read latch signal from the evaluation logic for latching data on the first data input, and the second control input receives a clock enable signal from the evaluation logic to buffer in the data on the second data input to the second data output.

7. The system of claim 2 wherein the memory converter further comprises:
a memory model for receiving memory address and control signals from the user memory interface to converting the user memory type into the type of memory of the memory device in the reconfigurable hardware unit and outputing a converted control signal to the bus driver and converted address; and
an address offset unit for receiving the converted address and generating an offset address to eliminate any overlaps in memory address among the memory blocks, the offset address provided to the bus driver.

8. The system of claim 1 wherein the bus driver is a multiplexer having a plurality of mux inputs, a mux control input, and a mux output coupled to the bus subsystem.

9. The system of claim 8 wherein the plurality of mux inputs further comprises:
a first mux input for providing data associated with DMA read transfer for the hardware-to-software data, a second mux input for providing data associated with DMA read transfer for register read data, a third mux input for data associated with the user memory interface, and a fourth mux input for data associated with memory write data.

10. The system of claim 9 wherein the mux control input further comprises a select signal for selecting among the plurality of mux inputs and an output enable signal for enabling the function of the multiplexer.

11. The system of claim 1 wherein the evaluation logic includes input control signals including an evaluation signal from the bus controller to control and indicate the activation of data evaluation of at least one logic device, a shiftin signal to indicate that the logic device associated with the evaluation logic will evaluate data, and a write control signal from the memory block interface to control and indicate the activation of a write operation from the logic device to at least one memory device.

12. The system of claim 1 wherein the evaluation logic includes evaluation control signals including a shiftout signal to indicate that the logic device associated with the evaluation logic will evaluate the last memory block in the logic device, a read latch signal to the memory block interface to control the reading of data from the memory device to the logic device, bus driver control signals to control the operation of the bus driver, and a plurality of data evaluation signals to evaluate data in the hardware model.

13. A simulation system operating in a host computer system for simulating a behavior of a circuit, the host computer system including a central processing unit (CPU), main memory, a local bus coupling the CPU to main memory and allowing communication between the CPU and main memory, and a system bus, the circuit having a structure and a function specified in a hardware language, the hardware language capable of describing the circuit as component types and connections, comprising:
a software model of the circuit coupled to the local bus;
software control logic coupled to the software model and a hardware logic element, for controlling the operation of the software model and said hardware logic element, including
interface logic which is capable of receiving input data and a clock signal from an external process, and
clock detection logic for detecting an active edge of the clock signal and generating a trigger signal; and
said hardware logic element coupled to the system bus and including
a system bus controller,
a hardware model bus coupled to the system bus controller,
at least one logic device and at least one memory device coupled to the hardware model bus,
a hardware model of at least a portion of the circuit residing in at least one logic device, the hardware logic element including clock enable logic for evaluating data in the hardware model in response to the trigger signal, and
a memory mapping system for mapping at least one memory block associated with the circuit in the hardware model from at least one logic device to at least one memory device.

14. The system of claim 13 wherein the memory mapping system further comprises:
a bus driver coupled to the hardware model bus;
a memory block interface for each memory block, the memory block interface coupled to the bus driver, the hardware model bus, and the hardware model to handle write/read memory access between at least one logic device and at least one memory device, at least one memory device storing the memory blocks associated with the hardware model; and
an evaluation logic in each logic device coupled to the hardware model, the bus driver, the memory block interface, and the system bus controller for providing evaluation control signals, the evaluation control signals used to evaluate data in the hardware model and to control write/read memory access between at least one logic device and at least one memory device via the bus driver and the memory block interface.

15. The system of claim 14, wherein the memory block interface further comprises:
a memory converter for interfacing with the hardware model and converting the user memory type into the type of memory of the memory device in the hardware logic element; and
a double buffer coupled to the hardware model bus, the evaluation logic, and the hardware model for receiving data from the hardware model bus.

16. The system of claim 15 wherein the double buffer further comprises:
a first flip-flop having a first data input, a first data output, and a first control input, herein the first data input is coupled to the bus subsystem for receiving data, the control input coupled to the evaluation logic for receiving evaluation control signals; and
a second flip-flop having a second data input, a second data output, and a second control input, wherein the second data input is coupled to the first data output, the second control input is coupled to the evaluation logic for receiving evaluation control signals including the trigger signal, and the second data output is coupled to the user memory interface.

17. A method of mapping memory blocks from at least one logic device to at least one memory device in a simulation system, the simulation system including a host computing system and reconfigurable hardware system, the reconfigurable hardware system including logic device and the memory device, the memory blocks associated with a user circuit design which is to be simulated, comprising:
generating a software model of the circuit;
generating a hardware model of at least a portion of the circuit;
configuring the hardware model in at least one logic device;
storing information from selected memory blocks located in at least one logic device to at least one memory device; and
performing data transfers among the host computer system, the logic devices and the memory devices selectively.

18. The method of claim 17, wherein the step of performing further comprising:
performing direct memory access (DMA) operation between the host computer system and at least one logic device;
performing evaluation operation between logic devices; and
performing memory access operation of memory blocks between at least one logic device and at least one memory device.

19. The method of claim 18, wherein the step of performing memory access operation is accomplished sequentially one logic device at a time.

20. The method of claim 18, wherein the steps of performing DMA operation, evaluation operation and memory access operation occur at substantially separate time intervals.

Description

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention generally relates to electronic design automation (EDA). More particularly, the present invention relates to a simulation and emulation system implemented in both software and hardware to verify electronic systems.

2. Description of Related Art

In general, electronic design automation (EDA) is a computer-based tool configured in various workstations to provide designers with automated or semi-automated tools for designing and verifying user's custom circuit designs. EDA is generally used for creating, analyzing, and editing any electronic design for the purpose of simulation, emulation, prototyping, execution, or computing. EDA technology can also be used to develop systems (i.e., target systems) which will use the user-designed subsystem or component. The end result of EDA is a modified and enhanced design, typically in the form of discrete integrated circuits or printed circuit boards, that is an improvement over the original design while maintaining the spirit of the original design.

The value of software simulating a circuit design followed by hardware emulation is recognized in various industries that use and benefit from EDA technology. Nevertheless, current software simulation and hardware emulation/acceleration are cumbersome for the user because of the separate and independent nature of these processes. For example, the user may want to simulate or debug the circuit design using software simulation for part of the time, use those results and accelerate the simulation process using hardware models during other times, inspect various register and combinational logic values inside the circuit at select times, and return to software simulation at a later time, all in one debug/test session. Furthermore, as internal register and combinational logic values change as the simulation time advances, the user should be able to monitor these changes even if the changes are occurring in the hardware model during the hardware acceleration/emulation process.

Co-simulation arose out of a need to address some problems with the cumbersome nature of using two separate and independent processes of pure software simulation and pure hardware emulation/acceleration, and to make the overall system more user-friendly. However, co-simulators still have a number of drawbacks: (1) co-simulation systems require manual partitioning, (2) co-simulation uses two loosely coupled engines, (3) co-simulation speed is as slow as software simulation speed, and (4) co-simulation systems encounter race conditions.

First, partitioning between software and hardware is done manually, instead of automatically, further burdening the user. In essence, co-simulation requires the user to partition the design (starting with behavior level, then RTL, and then gate level) and to test the models themselves among the software and hardware at very large functional blocks. Such a constraint requires some degree of sophistication by the user.

Second, co-simulation systems utilize two loosely coupled and independent engines, which raise inter-engine synchronization, coordination, and flexibility issues. Co-simulation requires synchronization of two different verification engines--software simulation and hardware emulation. Even though the software simulator side is coupled to the hardware accelerator side, only external pin-out data is available for inspection and loading. Values inside the modeled circuit at the register and combinational logic level are not available for easy inspection and downloading from one side to the other, limiting the utility of these co-simulator systems. Typically, the user may have to re-simulate the whole design if the user switches from software simulation to hardware acceleration and back. Thus, if the user wanted to switch between software simulation and hardware emulation/acceleration during a single debug session while being able to inspect register and combinational logic values, co-simulator systems do not provide this capability.

Third, co-simulation speed is as slow as simulation speed. Co-simulation requires synchronization of two different verification engines--software simulation and hardware emulation. Each of the engines has its own control mechanism for driving the simulation or emulation. This implies that the synchronization between the software and hardware pushes the overall performance to a speed that is as low as software simulation. The additional overhead to coordinate the operation of these two engines adds to the slow speed of co-simulation systems.

Fourth, co-simulation systems encounter set-up and hold time problems due to race conditions among clock signals. Co-simulators use hardware driven clocks, which may find themselves at the inputs to different logic elements at different times due to different wire line lengths. This raises the uncertainty level of evaluation results as some logic elements evaluate data at some time period and other logic elements evaluate data at different time periods, when these logic elements should be evaluating the data together.

In addition to these problems, the industry has not provided an effective way to provide simultaneous access to a simulation system for multiple users or multiple processes. Typically, only one workstation or process is coupled to a single simulation system.

Memory management is another problem in the industry. Existing simulation or emulation systems do not effectively address memory allocation/access issues. As known to those skilled in the art, the configured and mapped user's designs are associated with many memory blocks in each FPGA chip. These memory blocks are located throughout and sporadically in each FPGA chip. When the computing environment (e.g., simulation software and central processing unit) needs to access a particular memory block, it must do so through a separate memory controller or look in each FPGA chip via its own memory controller. The memory access thus becomes too slow and cumbersome. Moreover, these simulation and emulation systems dedicate certain pins in each FPGA for memory access purposes. Thus, the dedicated pin systems waste limited chip pin and functional resources. Also, for numerous memory blocks in each FPGA chip, the memory access becomes awkward.

Existing FPGA board-to-motherboard connection schemes are also inadequate as space becomes a premium on motherboards and signal reliability becomes an issue more than ever. Because each FPGA chip has limited capacity, several FPGA chips and several FPGA boards holding several FPGA chips must be used to accommodate the large and complicated user circuit designs. As more boards are used space on the motherboard becomes an issue. If a single connector is used to couple one FPGA board to the motherboard, the number of FPGA boards that can be coupled to the motherboard is limited by the size of these connectors. Given the large size of these connectors, the density of FPGA boards on motherboards is severely restricted. Furthermore, when multiple connectors are used to couple one FPGA board to the motherboard, signal reliability becomes an issue. With more connectors arranged along any given signal path, the chances of signal attenuation and reflection increase, thus decreasing signal reliability. During shipping and handling of systems using multiple board-to-motherboard connectors, the vibrations resulting from the physical handling these systems may cause decoupling of certain connections. With such decoupling, the reliability of signals will be a concern; that is, while some signals reach their designated destinations, other signals may never get there due to severed signal paths.

Another problem associated with current board-to-motherboard connection schemes is that when a backplane is not available, all signals transmitted between these FPGA boards must be routed to the connectors on the motherboard first. Such a requirement adds to the signal trace length and increases delay during execution. An interconnect scheme must be provided to minimize such long signal trace lengths.

Accordingly, a need exists in the industry for a system or method that addresses problems raised by currently known simulation systems, hardware emulation systems, hardware accelerators, and co-simulation systems.

SUMMARY OF THE INVENTION

The present invention provides solutions to the aforementioned problems in the form of a flexible and fast simulation/emulation system, called herein as the "SEmulation system" or "SEmulator system."

One object of the present invention is to provide a system that provides the speed of a hardware accelerator with the control of a software simulator.

Another object of the present invention is to provide a software simulator and a hardware accelerator with a single engine.

Still another object of the present invention is to provide a system with different modes of operation (e.g., software simulation, hardware acceleration, ICE, and post-simulation analysis) and the ability to switch among these different modes with relative ease.

A further object of the present invention is to provide a system that automatically provides hardware and software models of the user's custom circuit design.

Still yet another object of the present invention is to provide a means and method of avoiding race conditions.

The SEmulation system and method of the present invention provide users the ability to turn their designs of electronic systems into software and hardware representations for simulation. Generally, the SEmulation system is a software-controlled emulator or a hardware-accelerated simulator and the methods used therein. Thus, pure software simulation is possible, but the simulation can also be accelerated through the use of the hardware model. Hardware acceleration is possible with software control for starting, stopping, asserting values, and inspecting values. In-circuit emulation mode is also available to test the user's circuit design in the environment of the circuit's target system. Again, software control is available.

At the core of the system is a software kernel that controls both the software and hardware models to provide greater run-time flexibility for the user by allowing the user to start, stop, assert values, inspect values, and switch among the various modes. The kernel controls the various modes by controlling data evaluation in the hardware via the enable inputs to the registers.

The SEmulation system and method, in accordance with the present invention, provide four modes of operation: (1) Software Simulation, (2) Simulation via Hardware Acceleration, (3) In-Circuit Emulation (ICE), and (4) Post-Simulation Analysis. At a high level, the present invention is embodied in each of the above four modes or various combinations of these modes as follows: (1) Software Simulation alone; (2) Simulation via Hardware Acceleration alone; (3) In-Circuit Emulation (ICE) alone; (4) Post-Simulation Analysis alone; (5) Software Simulation and Simulation via Hardware Acceleration; (6) Software Simulation and ICE; (7) Simulation via Hardware Acceleration and ICE; (8) Software Simulation, Simulation via Hardware Acceleration, and ICE; (9) Software Simulation and Post-Simulation Analysis; (10) Simulation via Hardware Acceleration and Post-Simulation Analysis; (11) Software Simulation, Simulation via Hardware Acceleration, and Post-Simulation Analysis; (12) ICE and Post-Simulation Analysis; (13) Software Simulation, ICE, Post-Simulation Analysis; (14) Simulation via Hardware Acceleration, ICE, Post-Simulation Analysis; and (15) Software Simulation, Simulation via Hardware Acceleration, ICE, and Post-Simulation Analysis. Other combinations are possible and within the scope of the present invention.

Each mode or combination of modes provides the following features or combinations of features: (1) Switching among modes, manually or automatically; (2) Usage--the user can switch among modes, and can start, stop, assert values, inspect values, and single-step cycle through the simulation or emulation process; (3) Compilation process to generate software models and hardware models; (4) Software kernel to control all modes with a main control loop that includes, in one embodiment, the steps of initialize system, evaluate active test-bench processes/components, evaluate clock components, detect clock edge, update registers and memories, propagate combinational components, advance simulation time, and continue the loop as long as active test-bench processes are present; (5) Component type analysis for generating hardware models; (6) mapping hardware models to reconfigurable boards through, in one embodiment, clustering, placement, and routing; (7) software clock set-up to avoid race conditions through, in one embodiment, gated clock logic analysis and gated data logic analysis; (8) software clock implementation through, in one embodiment, clock edge detection in the software model to trigger an enable signal in the hardware model, send signal from the primary clock to the clock input of the clock edge register in the hardware model via the gated clock logic, send a clock enable signal to the enable input of the hardware model's register, send data from the primary clock register to the hardware model's register via the gated data logic, and reset the clock edge register disabling the clock enable signal to the enable input of the hardware model's registers; (9) log selective data for debug sessions and post-simulation analysis; (10) combinational logic regeneration; (11) in one embodiment, a basic building block is a D-type register with asynchronous inputs and synchronous inputs; (12) address pointers in each chip; (13) multiplexed cross chip address pointer chain; (14) array of FPGA chips and their interconnection scheme; (15) banks of FPGA chips with a bus that tracks the performance of the PCI bus system; (16) FPGA banks that allow expansion via piggyback boards; and (17) time division multiplexed (TDM) circuit for optimal pin usage. The present invention, through its various embodiments, provides other features as discussed herein, which may not be listed in the above list of features.

One embodiment of the present invention is a simulation system. The simulation system operates in a host computer system for simulating a behavior of a circuit. The host computer system includes a central processing unit (CPU), main memory, and a local bus coupling the CPU to main memory and allowing communication between the CPU and main memory. The circuit has a structure and a function specified in a hardware language, such as HDL, which is capable of describing the circuit as component types and connections. The simulation system includes: a software model, a software control logic, and a hardware logic element.

The software model of the circuit is coupled to the local bus. Typically, it resides in main memory. The software control logic is coupled to the software model and the hardware logic element, for controlling the operation of the software model and the hardware logic element. The software control logic includes interface logic that is capable of receiving input data and a clock signal from an external process, and a clock detection logic for detecting an active edge of the clock signal and generating a trigger signal. The hardware logic element is also coupled to the local bus and includes a hardware model of at least a portion of the circuit based on component type, and a clock enable logic for evaluating data in the hardware model in response to the trigger signal.

The hardware logic element also comprises an array or plurality of field programmable devices coupled together. Each field programmable device includes a portion of the hardware model of the circuit and thus, the combination of all the field programmable devices includes the entire hardware model. A plurality of interconnections also couple the portions of the hardware model together. Each interconnection represents a direct connection between any two field programmable devices located in the same row or column. The shortest path between any two field programmable devices in the array is at most two interconnections or "hops."

Another embodiment of the present invention is a system and method of simulating a circuit, where the circuit is modeled in software and at least a portion of the circuit is modeled in hardware. Data evaluation occurs in the hardware but is controlled in software via a software clock. Data to be evaluated propagates and stabilizes to the hardware model. When the software model detects an active clock edge, it sends an enable signal to the hardware model to activate data evaluation. The hardware model evaluates the data and then waits for the new incoming data that may be evaluated at the next active clock edge signal detection in the software model.

Another embodiment of the present invention includes a software kernel that controls the operation of the software model and the hardware model. The software kernel comprises the steps of evaluate active test-bench processes/components, evaluate clock components, detect clock edge, update registers and memories, propagate combinational components, advance simulation time, and continue the loop as long as active test-bench processes are present.

A further embodiment of the present invention is a method of simulating a circuit, where the circuit has a structure and a function specified in a hardware language, such as HDL. The hardware language is also capable of describing or reducing the circuit into components. The method steps comprise: (1) determining component type in the hardware language; (2) generating a model of the circuit based on component type; and (3) simulating the behavior of the circuit with the model by providing input data to the model. Generating the model may include: (1) generating a software model of the circuit; and (2) generating a hardware model of the circuit based on component type.

In another embodiment, the present invention is a method of simulating a circuit. The steps include: (1) generating a software model of the circuit; (2) generating a hardware model of the circuit; (3) simulating a behavior of the circuit with the software model by providing input data to the software model; (4) selectively switching to the hardware model; (5) providing input data to the hardware model; and (6) simulating a behavior of the circuit with the hardware model by accelerating the simulation in the hardware model. The method may also include the additional steps of: (1) selectively switching to the software model; and (2) simulating a behavior of the circuit with the software model by providing input data to the software model. The simulation can also be stopped with the software model.

For the in-circuit emulation mode, the method comprises: (1) generating a software model of the circuit; (2) generating a hardware model of at least a portion of the circuit; (3) providing input signals from the target system to the hardware model; (4) providing output signals from the hardware model to the target system; (5) simulating a behavior of the circuit with the hardware model, where the software model is capable of controlling the simulation/emulation, cycle by cycle.

For the post-simulation analysis, the method of simulating a circuit comprises: (1) generating a model of the circuit; (2) simulating a behavior of the circuit with the model by providing input data to the model; and (3) logging selective input data and selective output data as log points from the model. A software and hardware model can be generated. The method may further comprise the steps of: (1) selecting a desired time-dependent point in the simulation; (2) selecting a log point at or prior to the selected time-dependent point; (3) providing input data to the hardware model; and (4) simulating a behavior of the circuit with the hardware model from the selected log point.

A further embodiment of the present invention is a method of generating models for a simulation system for simulating a circuit. The steps include: (1) generating a software model of the circuit; (2) generating a hardware model for at least a portion of the circuit based on component type, said component type including register components and combinational components; and (3) generating a clock generation circuit in the hardware model to trigger data evaluation in the hardware model in response to clock edge detection in the software model.

Another aspect of the present invention is a memory Simulation system. The various embodiments of the present invention provide a structure and scheme where the numerous memory blocks associated with the user's design is mapped into the SRAM memory devices in the Simulation system instead of inside the logic devices, which are used to configure and model the user's design. Thus, the memory Simulation system includes a memory state machine, an evaluation state machine, and their associated logic to control and interface with: (1) the main computing system and its associated memory system, (2) the SRAM memory devices coupled to the FPGA buses in the Simulation system, and (3) the FPGA logic devices which contain the configured and programmed user design that is being debugged.

The FPGA logic device side of the memory Simulation system includes an evaluation state machine, an FPGA bus driver, and a logic interface for each memory block N to interface with the user's own memory interface in the user design to handle: (1) data evaluations among the FPGA logic devices, and (2) write/read memory access between the FPGA logic devices and the SRAM memory devices. In conjunction with the FPGA logic device side, the FPGA I/O controller side includes a memory state machine and interface logic to handle DMA, write, and read operations between: (1) main computing system and the FPGA logic devices (SRAM memory devices for initialization and memory dump), and (2) FPGA logic devices and the SRAM memory devices.

The operation of the memory Simulation system in accordance with one embodiment of the present invention is generally as follows. The Simulation write/read cycle is divided into three periods--DMA data transfer, evaluation, and memory access. To indicate the completion of a Simulation write/read cycle, the memory Simulation system can send and receive the DONE signal to the CTRL.sub.-- FPGA unit and the computing system. The DATAXSFR signal indicates the occurrence of the DMA data transfer period where the computing system and the FPGA logic devices are transferring data to each other via the FPGA data bus, high bank bus (FD[63:32]) 1212 and low bank bus (FD[31:0]) 1213.

During the evaluation period, logic circuitry in each FPGA logic device generates the proper software clock, input enable, and mux enable signals to the user's design logic for data evaluation. Inter-FPGA logic device communication occurs in this period. The CTRL.sub.-- FPGA unit also begins an evaluation counter to control the duration of the evaluation period. This is software settable.

During the memory access period, the memory Simulation system waits for the high and low bank FPGA logic devices to put their respective address and control signals onto their respective FPGA data buses. These address and control signals are latched in by the CTRL.sub.-- FPGA unit. If the operation is a write, address, control, and data signals are transported from the FPGA logic devices to their respective SRAM memory devices. If the operation is a read, address, control, and data signals are transported from the SRAM memory devices to their respective FPGA logic devices. At the FPGA logic device side, the FD bus driver places the address and control signals of a memory block onto the FPGA data bus (FD bus). If the operation is a write, the write data is placed on the FD bus for that memory block. If the operation is a read, the double buffer latches in the data for the memory block on the FD bus from the SRAM memory device. This operation continues for each memory block in each FPGA logic device. When all the desired memory blocks in an FPGA logic device has been accessed, the memory Simulation system proceeds to the next FPGA logic device in each bank and begins accessing the memory blocks in that FPGA logic device. After all desired memory blocks in all FPGA logic devices have been accessed, the memory Simulation write/read cycle is complete and the memory Simulation system is idle until the onset of the next memory Simulation write/read cycle.

These and other embodiments are fully discussed and illustrated in the following sections of the specification.

BRIEF DESCRIPTION OF THE FIGURES

The above objects and description of the present invention may be better understood with the aid of the following text and accompanying drawings.

FIG. 1 shows a high level overview of one embodiment of the present invention, including the workstation, reconfigurable hardware emulation model, emulation interface, and the target system coupled to a PCI bus.

FIG. 2 shows one particular usage flow diagram of the present invention.

FIG. 3 shows a high level diagram of the software compilation and hardware configuration during compile time and run time in accordance with one embodiment of the present invention.

FIG. 4 shows a flow diagram of the compilation process, which includes generating the software/hardware models and the software kernel code.

FIG. 5 shows the software kernel that controls the overall SEmulation system.

FIG. 6 shows a method of mapping hardware models to reconfigurable boards through mapping, placement, and routing.

FIG. 7 shows the connectivity matrix for the FPGA array shown in FIG. 8.

FIG. 8 shows one embodiment of the 4.times.4 FPGA array and their interconnections.

FIGS. 9(A), 9(B), and 9(C) illustrate one embodiment of the time division multiplexed (TDM) circuit which allows a group of wires to be coupled together in a time multiplexed fashion so that one pin, instead of a plurality of pins, can be used for this group of wires in a chip. FIG. 9(A) presents an overview of the pin-out problem, FIG. 9(B) provides a TDM circuit for the transmission side, and FIG. 9(C) provides a TDM circuit for the receiver side.

FIG. 10 shows a SEmulation system architecture in accordance with one embodiment of the present invention.

FIG. 11 shows one embodiment of address pointer of the present invention.

FIG. 12 shows a state transition diagram of the address pointer initialization for the address pointer of FIG. 11.

FIG. 13 shows one embodiment of the MOVE signal generator for derivatively generating the various MOVE signals for the address pointer.

FIG. 14 shows the chain of multiplexed address pointers in each FPGA chip.

FIG. 15 shows one embodiment of the multiplexed cross chip address pointer chain in accordance with one embodiment of the present invention.

FIG. 16 shows a flow diagram of the clock/data network analysis that is critical for the software clock implementation and the evaluation of logic components in the hardware model.

FIG. 17 shows a basic building block of the hardware model in accordance with one embodiment of the present invention.

FIGS. 18(A) and 18(B) show the register model implementation for latches and flip-flops.

FIG. 19 shows one embodiment of the clock edge detection logic in accordance with one embodiment of the present invention.

FIG. 20 shows a four state finite state machine to control the clock edge detection logic of FIG. 19 in accordance with one embodiment of the present invention.

FIG. 21 shows the interconnection, JTAG, FPGA bus, and global signal pin designations for each FPGA chip in accordance with one embodiment of the present invention.

FIG. 22 shows one embodiment of the FPGA controller between the PCI bus and the FPGA array.

FIG. 23 shows a more detailed illustration of the CTRL.sub.-- FPGA unit and data buffer which were discussed with respect to FIG. 22.

FIG. 24 shows the 4.times.4 FPGA array, its relationship to the FPGA banks, and expansion capability.

FIG. 25 shows one embodiment of the hardware start-up method.

FIG. 26 shows the HDL code for one example of a user circuit design to be modeled and simulated.

FIG. 27 shows a circuit diagram that symbolically represent the circuit design of the HDL code in FIG. 26.

FIG. 28 shows the component type analysis for the HDL code of FIG. 26.

FIG. 29 shows a signal network analysis of a structured RTL HDL code based on the user's custom circuit design shown in FIG. 26.

FIG. 30 shows the software/hardware partition result for the same hypothetical example.

FIG. 31 shows a hardware model for the same hypothetical example.

FIG. 32 shows one particular hardware model-to-chip partition result for the same hypothetical example of a user's custom circuit design.

FIG. 33 shows another particular hardware model-to-chip partition result for the same hypothetical example of a user's custom circuit design.

FIG. 34 shows the logic patching operation for the same hypothetical example of a user's custom circuit design.

FIGS. 35(A) to 35(D) illustrate the principle of "hops" and interconnections with two examples.

FIG. 36 shows an overview of the FPGA chip used in the present invention.

FIG. 37 shows the FPGA interconnection buses on the FPGA chip.

FIGS. 38(A) and 38(B) show side views of the FPGA board connection scheme in accordance with one embodiment of the present invention.

FIG. 39 shows a direct-neighbor and one-hop six-board interconnection layout of the FPGA array in accordance with one embodiment of the present invention.

FIGS. 40(A) and 40(B) show FPGA inter-board interconnection scheme.

FIGS. 41(A) to 41(F) show top views of the board interconnection connectors.

FIG. 42 shows on-board connectors and some components in a representative FPGA board.

FIG. 43 shows a legend of the connectors in FIGS. 41(A) to 41(F) and 42.

FIG. 44 shows a direct-neighbor and one-hop dual-board interconnection layout of the FPGA array in accordance with another embodiment of the present invention.

FIG. 45 shows a workstation with multiprocessors in accordance with another embodiment of the present invention.

FIG. 46 shows an environment in accordance with another embodiment of the present invention in which multiple users share a single simulation/emulation system on a time-shared basis.

FIG. 47 shows a high level structure of the Simulation server in accordance with one embodiment of the present invention.

FIG. 48 shows the architecture of the Simulation server in accordance with one embodiment of the present invention.

FIG. 49 shows a flow diagram of the Simulation server.

FIG. 50 shows a flow diagram of the job swapping process.

FIG. 51 shows the signals between the device driver and the reconfigurable hardware unit.

FIG. 52 illustrates the time-sharing feature of the Simulation server for handling multiple jobs with different levels of priorities.

FIG. 53 shows the communication handshake signals between the device driver and the reconfigurable hardware unit.

FIG. 54 shows the state diagram of the communication handshake protocol.

FIG. 55 shows an overview of the client-server model of the Simulation server in accordance with one embodiment of the present invention.

FIG. 56 shows a high level block diagram of the Simulation system for implementing memory mapping in accordance with one embodiment of the present invention.

FIG. 57 shows a more detailed block diagram of the memory mapping aspect of the Simulation system with supporting components for the memory finite state machine (MEMFSM) and the evaluation finite state machine for each FPGA logic device (EVALFSMx).

FIG. 58 shows a state diagram of a finite state machine of the MEMFSM unit in the CTRL.sub.-- FPGA unit in accordance with one embodiment of the present invention.

FIG. 59 shows a state diagram of a finite state machine in each FPGA chip in accordance with one embodiment of the present invention.

FIG. 60 shows the memory read data double buffer.

FIG. 61 shows the Simulation write/read cycle in accordance with one embodiment of the present invention.

FIG. 62 shows a timing diagram of the Simulation data transfer operation when the DMA read operation occurs after the CLK.sub.-- EN signal.

FIG. 63 shows a timing diagram of the Simulation data transfer operation when the DMA read operation occurs near the end of the EVAL period.

These figures will be discussed below with respect to several different aspects and embodiments of the present invention.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

This specification will describe the various embodiments of the present invention through and within the context of a system called "SEmulator" or "SEmulation" system. Throughout the specification, the terms "SEmulation system," "SEmulator system," "SEmulator," or simply "system" may be used. These terms refer to various apparatus and method embodiments in accordance with the present invention for any combination of four operating modes: (1) software simulation, (2) simulation through hardware acceleration, (3) in-circuit emulation (ICE), and (4) post-simulation analysis, including their respective set-up or pre-processing stages. At other times, the term "SEmulation" may be used. This term refers to the novel processes described herein.

The specification also makes references to a "user" and a user's "circuit design" or "electronic design." The "user" is a person who uses the SEmulation system through its interfaces and may be the designer of a circuit or a test/debugger who played little or no part in the design process. The "circuit design" or "electronic design" is a custom designed system or component, whether software or hardware, which can be modeled by the SEmulation system for test/debug purposes. In many cases, the "user" also designed the "circuit design" or "electronic design."

The specification also uses the terms "wire," "wire line," "wire/bus line," and "bus." These terms refer to various electrically conducting lines. Each line may be a single wire between two points or several wires between points. These terms are interchangeable in that a "wire" may comprise one or more conducting lines and a "bus" may also comprise one or more conducting lines.

This specification is presented in outline form. First, the specification presents a general overview of the SEmulator system, including an overview of the four operating modes and the hardware implementation schemes. Second, the specification provides a detailed discussion of the SEmulator system. In some cases, one figure may provide a variation of an embodiment shown in a previous figure. In these cases, like reference numerals will be used for like components/units/processes. The outline of the specification is as follows:

I. Overview

A. Simulation/Hardware Acceleration Modes

B. Emulation With Target System Mode

C. Post-Simulation Analysis Mode

D. Hardware Implementation Schemes

E. Simulation Server

F. Memory Simulation

II. System Description

III. Simulation/Hardware Acceleration Modes

IV. Emulation With Target System Mode

V. Post-Simulation Analysis Mode

VI. Hardware Implementation Schemes

A. Overview

B. Address Pointer

C. Gated Data/Clock Network Analysis

D. FPGA Array and Control

E. Alternate Embodiment Using Denser FPGA Chips

VII. Simulation Server

VIII. Memory Simulation

IX. Examples

I. OVERVIEW

The various embodiments of the present invention have four general modes of operation: (1) software simulation, (2) simulation through hardware acceleration, (3) in-circuit emulation, and (4) post-simulation analysis. The various embodiments include the system and method of these modes with at least some of the following features:

(1) a software and hardware model having a single tightly coupled simulation engine, a software kernel, which controls the software and hardware models cycle by cycle; (2) automatic component type analysis during the compilation process for software and hardware model generation and partitioning; (3) ability to switch (cycle by cycle) among software simulation mode, simulation through hardware acceleration mode, in-circuit emulation mode, and post-simulation analysis mode; (4) full hardware model visibility through software combinational component regeneration; (5) double-buffered clock modeling with software clocks and gated clock/data logic to avoid race conditions; and (6) ability to re-simulate or hardware accelerate the user's circuit design from any selected point in a past simulation session. The end result is a flexible and fast simulator/emulator system and method with full HDL functionality and emulator execution performance.

A. Simulation/Hardware Acceleration Modes

The SEmulator system, through automatic component type analysis, can model the user's custom circuit design in software and hardware. The entire user circuit design is modeled in software, whereas evaluation components (i.e., register component, combinational component) are modeled in hardware. Hardware modeling is facilitated by the component type analysis.

A software kernel, residing in the main memory of the general purpose processor system, serves as the SEmulator system's main program that controls the overall operation and execution of its various modes and features. So long as any test-bench processes are active, the kernel evaluates active test-bench components, evaluates clock components, detects clock edges to update registers and memories as well as propagating combinational logic data, and advances the simulation time. This software kernel provides for the tightly coupled nature of the simulator engine with the hardware acceleration engine. For the software/hardware boundary, the SEmulator system provides a number of I/O address spaces--REG (register), CLK (software clock), S2H (software to hardware), and H2S (hardware to software).

The SEmulator has the capability to selectively switch among the four modes of operation. The user of the system can start simulation, stop simulation, assert input values, inspect values, single step cycle by cycle, and switch back and forth among the four different modes. For example, the system can simulate the circuit in software for a time period, accelerate the simulation through the hardware model, and return back to software simulation mode.

Generally, the SEmulation system provides the user with the capability to "see" every modeled component, regardless of whether it's modeled in software or hardware. For a variety of reasons, combinational components are not as "visible" as registers, and thus, obtaining combinational component data is difficult. One reason is that FPGAs, which are used in the reconfigurable board to model the hardware portion of the user's circuit design, typically model combinational components as look-up tables (LUT), instead of actual combinational components. Accordingly, the SEmulation system reads register values and then regenerates combinational components. Because some overhead is needed to regenerate the combinational components, this regeneration process is not performed all the time; rather, it is done only upon the user's request.

Because the software kernel resides in the software side, a clock edge detection mechanism is provided to trigger the generation of a so-called software clock that drives the enable input to the various registers in the hardware model. The timing is strictly controlled through a double-buffered circuit implementation so that the software clock enable signal enters the register model before the data to these models. Once the data input to these register models have stabilized, the software clock gates the data synchronously to ensure that all data values are gated together without any risk of hold-time violations.

Software simulation is also fast because the system logs all input values and only selected register values/states, thus overhead is minimized by decreasing the number of I/O operations. The user can selectively select the logging frequency.

B. Emulation With Target System Mode

The SEmulation system is capable of emulating the user's circuit within its target system environment. The target system outputs data to the hardware model for evaluation and the hardware model also outputs data to the target system. Additionally, the software kernel controls the operation of this mode so that the user still has the option to start, stop, assert values, inspect values, single step, and switch from one mode to another.

C. Post-Simulation Analysis Mode

Logs provide the user with a historical record of the simulation session. Unlike known simulation systems, the SEmulation system does not log every single value, internal state, or value change during the simulation process. The SEmulation system logs only selected values and states based on a logging frequency (i.e., log 1 record every N cycles). During the post-simulation stage, if the user wants to examine various data around point X in the just-completed simulation session, the user goes to one of the logged points, say logged point Y, that is closest and temporally located prior to point X. The user then simulates from that selected logged point Y to his desired point X to obtain simulation results.

D. Hardware Implementation Schemes

The SEmulation system implements an array of FPGA chips on a reconfigurable board. Based on the hardware model, the SEmulation system partitions, maps, places, and routes each selected portion of the user's circuit design onto the FPGA chips. Thus, for example, a 4.times.4 array of 16 chips may be modeling a large circuit spread out across these 16 chips. The interconnect scheme allows each chip to access another chip within 2 "jumps" or links.

Each FPGA chip implements an address pointer for each of the I/O address spaces (i.e., REG, CLK, S2H, H2S). The combination of all address pointers associated with a particular address space are chained together. So, during data transfer, word data in each chip is sequentially selected from/to the main FPGA bus and PCI bus, one word at a time for the selected address space in each chip, and one chip at a time, until the desired word data have been accessed for that selected address space. This sequential selection of word data is accomplished by a propagating word selection signal. This word selection signal travels through the address pointer in a chip and then propagates to the address pointer in the next chip and continues on till the last chip or the system initializes the address pointer.

The FPGA bus system in the reconfigurable board operates at twice the PCI bus bandwidth but at half the PCI bus speed. The FPGA chips are thus separated into banks to utilize the larger bandwidth bus. The throughput of this FPGA bus system can track the throughput of the PCI bus system so performance is not lost by reducing the bus speed. Expansion is possible through piggyback boards that extend the bank length.

In another embodiment of the present invention, denser FPGA chips are used. One such denser chip is the Altera 10K130V and 10K250V chips. Use of these chips alters the board design such that only four FPGA chips, instead of eight less dense FPGA chips (e.g., Altera 10K100), are used per board.

The FPGA array in the Simulation system is provided on the motherboard through a particular board interconnect structure. Each chip may have up to eight sets of interconnections, where the interconnections are arranged according to adjacent direct-neighbor interconnects (i.e., N[73:0], S[73:0], W[73:0], E[73:0]), and one-hop neighbor interconnects (i.e., NH[27:0], SH[27:0], XH[36:0], XH[72:37]), excluding the local bus connections, within a single board and across different boards. Each chip is capable of being interconnected directly to adjacent neighbor chips, or in one hop to a non-adjacent chip located above, below, left, and right. In the X direction (east-west), the array is a torus. In the Y direction (north-south), the array is a mesh.

The interconnects alone can couple logic devices and other components within a single board. However, inter-board connectors are provided to couple these boards and interconnects together across different boards to carry signals between (1) the PCI bus via the motherboard and the array boards, and (2) any two array boards.

A motherboard connector connects the board to the motherboard, and hence, to the PCI bus, power, and ground. For some boards, the motherboard connector is not used for direct connection to the motherboard. In a six-board configuration, only boards 1, 3, and 5 are directly connected to the motherboard while the remaining boards 2, 4, and 6 rely on their neighbor boards for motherboard connectivity. Thus, every other board is directly connected to the motherboard, and interconnects and local buses of these boards are coupled together via inter-board connectors arranged solder-side to component-side. PCI signals are routed through one of the boards (typically the first board) only. Power and ground are applied to the other motherboard connectors for those boards. Placed solder-side to component-side, the various inter-board connectors allow communication among the PCI bus components, the FPGA logic devices, memory devices, and various Simulation system control circuits.

E. Simulation Server

In another embodiment of the present invention, a Simulation server is provided to allow multiple users to access the same reconfigurable hardware unit. In one system configuration, multiple workstations across a network or multiple users/processes in a non-network environment can access the same server-based reconfigurable hardware unit to review/debug the same or different user circuit design. The access is accomplished via a time-shared process in which a scheduler determines access priorities for the multiple users, swaps jobs, and selectively locks hardware model access among the scheduled users. In one scenario, each user can access the server to map his/her separate user design to the reconfigurable hardware model for the first time, in which case the system compiles the design to generate the software and hardware models, performs the clustering operation, performs place-and-route operations, generates a bitstream configuration file, and reconfigures the FPGA chips in the reconfigurable hardware unit to model the hardware portion of the user's design. When one user has accelerated his design using the hardware model and downloaded the hardware state to his own memory for software simulation, the hardware unit can be released for access by another user.

The server provides the multiple users or processes to access the reconfigurable hardware unit for acceleration and hardware state swapping purposes. The Simulation server includes the scheduler, one or more device drivers, and the reconfigurable hardware unit. The scheduler in the Simulation server is based on a preemptive round robin algorithm. The server scheduler includes a simulation job queue table, a priority sorter, and a job swapper. The restore and playback function of the present invention facilitates the non-network multiprocessing environment as well as the network multi-user environment in which previous checkpoint state data can be downloaded and the entire simulation state associated with that checkpoint can be restored for playback debugging or cycle-by-cycle stepping.

F. Memory Simulation

The Memory Simulation or memory mapping aspect of the present invention provides an effective way for the Simulation system to manage the various memory blocks associated with the configured hardware model of the user's design, which was programmed into the array of FPGA chips in the reconfigurable hardware unit. The memory Simulation aspect of the invention provides a structure and scheme where the numerous memory blocks associated with the user's design is mapped into the SRAM memory devices in the Simulation system instead of inside the logic devices, which are used to configure and model the user's design. The memory Simulation system includes a memory state machine, an evaluation state machine, and their associated logic to control and interface with: (1) the main computing system and its associated memory system, (2) the SRAM memory devices coupled to the FPGA buses in the Simulation system, and (3) the FPGA logic devices which contain the configured and programmed user design that is being debugged. The operation of the memory Simulation system in accordance with one embodiment of the present invention is generally as follows. The Simulation write/read cycle is divided into three periods--DMA data transfer, evaluation, and memory access.

The FPGA logic device side of the memory Simulation system includes an evaluation state machine, an FPGA bus driver, and a logic interface for each memory block N to interface with the user's own memory interface in the user design to handle: (1) data evaluations among the FPGA logic devices, and (2) write/read memory access between the FPGA logic devices and the SRAM memory devices. In conjunction with the FPGA logic device side, the FPGA I/O controller side includes a memory state machine and interface logic to handle DMA, write, and read operations between: (1) main computing system and SRAM memory devices, and (2) FPGA logic devices and the SRAM memory devices.

II. SYSTEM DESCRIPTION

FIG. 1 shows a high level overview of one embodiment of the present invention. A workstation 10 is coupled to a reconfigurable hardware model 20 and emulation interface 30 via PCI bus system 50. The reconfigurable hardware model 20 is coupled to the emulation interface 30 via PCI bus 50, as well as cable 61. A target system 40 is coupled to the emulation interface 30 via cables 60. In other embodiments, the in-circuit emulation set-up 70 which comprises the emulation interface 30 and target system 40 (as shown in the dotted line box) are not provided in this set-up when emulation of the user's circuit design within the target system's environment is not desired during a particular test/debug session. Without the in-circuit emulation set-up
70, the reconfigurable hardware model 20 communicates with the workstation 10 via the PCI bus 50.

In combination with the in-circuit emulation set-up 70, the reconfigurable hardware model 20 imitates or mimics the user's circuit design of some electronic subsystem in the target system. To ensure the correct operation of the user's circuit design of the electronic subsystem within the target system's environment, input and output signals between the target system 40 and the modeled electronic subsystem must be provided to the reconfigurable hardware model 20 for evaluation. Hence, the input and output signals of the target system 40 to/from the reconfigurable hardware model 20 are delivered via cables 60 through the emulation interface 30 and the PCI bus 50. Alternatively, input/output signals of the target system 40 can be delivered to the reconfigurable hardware model 20 via emulation interface 30 and cables 61.

The control data and some substantive simulation data pass between the reconfigurable hardware model 20 and the workstation 10 via the PCI bus 50. Indeed, the workstation 10 runs the software kernel that controls the operation of the entire SEmulation system and must have access (read/write) to the reconfigurable hardware model 20.

A workstation 10 complete with a computer, keyboard, mouse, monitor and appropriate bus/network interface allows a user to enter and modify data describing the circuit design of an electronic system. Exemplary workstations include a Sun Microsystems SPARC or ULTRA-SPARC workstation or an Intel/Microsoft-based computing station. As known to those ordinarily skilled in the art, the workstation 10 comprises a CPU 11, a local bus 12, a host/PCI bridge 13, memory bus 14, and main memory 15. The various software simulation, simulation by hardware acceleration, in-circuit emulation, and post-simulation analysis aspects of the present invention are provided in the workstation 10, reconfigurable hardware model 20, and emulation interface 30. The algorithm embodied in software is stored in main memory 15 during a test/debug session and executed through the CPU 11 via the workstation's operating system.

As known to those ordinarily skilled in the art, after the operating system is loaded into the memory of workstation 10 by the start-up firmware, control passes to its initialization code to set up necessary data structures, and load and initialize device drivers. Control is then passed to the command line interpreter (CLI), which prompts the user to indicate the program to be run. The operating system then determines the amount of memory needed to run the program, locates the block of memory, or allocates a block of memory and accesses the memory either directly or through BIOS. After completion of the memory loading process, the application program begins execution.

One embodiment of the present invention is a particular application program for SEmulation. During the course of its execution, the application program may require numerous services from the operating system, including, but not limited to, reading from and writing to disk files, performing data communications, and interfacing with the display/keyboard/mouse.

The workstation 10 has the appropriate user interface to allow the user to enter the circuit design data, edit the circuit design data, monitor the progress of simulations and emulations while obtaining results, and essentially control the simulation and emulation process. Although not shown in FIG. 1, the user interface includes user-accessible menu-driven options and command sets which can be entered with the keyboard and mouse and viewed with a monitor. Typically, the user uses a computing station 80 with a keyboard 90.

The user typically creates a particular circuit design of an electronic system and enters a HDL (usually structured RTL level) code description of his designed system into the workstation 10. The SEmulation system of the present invention performs component type analysis, among other operations, for partitioning the modeling between software and hardware. The SEmulation system models behavior, RTL, and gate level code in software. For hardware modeling, the system can model RTL and gate level code; however, the RTL level must be synthesized to gate level prior to hardware modeling. The gate level code can be processed directly into usable source design database format for hardware modeling. Using the RTL and gate level codes, the system automatically performs component type analysis to complete the partition step. Based on the partitioning analysis during software compile time, the system maps some portion of the circuit design into hardware for fast simulation via hardware acceleration. The user can also couple the modeled circuit design to the target system for real environment in-circuit emulation. Because the software simulation and the hardware acceleration engines are tightly coupled, through the software kernel, the user can then simulate the overall circuit design using software simulation, accelerate the test/debug process by using the hardware model of the mapped circuit design, return to the simulation portion, and return to the hardware acceleration until the test/debug process is complete. The ability to switch between software simulation and hardware acceleration cycle-by-cycle and at will by the user is one of the valuable features of this embodiment. This feature is particularly useful in the debug process by allowing the user to go to a particular point or cycle very quickly using the hardware acceleration mode and then using software simulation to examine various points thereafter to debug the circuit design. Moreover, the SEmulation system makes all components visible to the user whether the internal realization of the component is in hardware or software. The SEmulation system accomplishes this by reading the register values from the hardware model and then rebuilding the combinational components using the software model when the user requests such a read. These and other features will be discussed more fully later in the specification.

The workstation 10 is coupled to a bus system 50. The bus system can be any available bus system that allows various agents, such as the workstation 10, reconfigurable hardware model 20, and emulation interface 30, to be operably coupled together. Preferably, the bus system is fast enough to provide real-time or near real-time results to the user. One such bus system is the bus system described in the Peripheral Component Interconnect (PCI) standard, which is incorporated herein by reference. Currently, revision 2.0 of the PCI standard provides for a 33 MHz bus speed. Revision 2.1 provides support for 66 MHz bus speed. Accordingly, the workstation 10, reconfigurable hardware model 20, and emulation interface 30 may comply with the PCI standard.

In one embodiment, communication between the workstation 10 and the reconfigurable hardware model 20 is handled on the PCI bus. Other PCI-compliant devices may be found in this bus system. These devices may be coupled to the PCI bus at the same level as the workstation 10, reconfigurable hardware model 20, and emulation interface 30, or other levels. Each PCI bus at a different level, such as PCI bus 52, is coupled to another PCI bus level, such as PCI bus 50, if it exists at all, through a PCI-to-PCI bridge 51. At PCI bus 52, two PCI devices 53 and 54 may be coupled therewith.

The reconfigurable hardware model 20 comprises an array of field-programmable gate array (FPGA) chips that can be programmably configured and reconfigured to model the hardware portion of the user's electronic system design. In this embodiment, the hardware model is reconfigurable; that is, it can reconfigure its hardware to suit the particular computation or user circuit design at hand. If, for example, many adders or multiplexers are required, the system is configured to include many adders and multiplexers. As other computing elements or functions are needed, they may also be modeled or formed in the system. In this way, the system can be optimized to perform specialized computations or logic operations. Reconfigurable systems are also flexible, so that users can work around minor hardware defects that arise during manufacture, testing, or use. In one embodiment, the reconfigurable hardware model 20 comprises a two-dimensional array of computing elements consisting of FPGA chips to provide the computational resources for various user circuit designs and applications. More details on the hardware configuration process will be provided.

Two such FPGA chips include those sold by Altera and Xilinx. In some embodiments, the reconfigurable hardware model is reconfigurable via the use of field programmable devices. However, other embodiments of the present invention may be implemented using application specific integrated circuit (ASIC) technology. Still other embodiments may be in the form of a custom integrated circuit.

In a typical test/debug scenario, reconfigurable devices will be used to simulate/emulate the user's circuit design so that appropriate changes can be made prior to actual prototype manufacturing. In some other instances, however, an actual ASIC or custom integrated circuit can be used, although this deprives the user of the ability to quickly and cost-effectively change a possibly non-functional circuit design for re-simulation and re-emulation. At times, though, such an ASIC or custom IC has already been manufactured and readily available so that emulation with an actual non-reconfigurable chip may be preferable.

In accordance with the present invention, the software in the workstation, along with its integration with an external hardware model, provides a greater degree of flexibility, control, and performance for the end user over existing systems. To run the simulation and emulation, a model of the circuit design and the relevant parameters (e.g., input test-bench stimulus, overall system output, intermediate results) are determined and provided to the simulation software system. The user can use either schematic capture tools or synthesis tools to define the system circuit design. The user starts with a circuit design of an electronic system, usually in draft schematic form, which is then converted to HDL form using synthesis tools. The HDL can also be directly written by the user. Exemplary HDL languages include Verilog and VHDL; however, other languages are also available. A circuit design represented in HDL comprises many concurrent components. Each component is a sequence of code which either defines the behavior of a circuit element or controls the execution of the simulation.

The SEmulation system analyzes these components to determine their component types and the compiler uses this component type information to build different execution models in software and hardware. Thereafter, the user can use the SEmulation system of the present invention. The designer can verify the accuracy of the circuit through simulation by applying various stimuli such as input signals and test vector patterns to the simulated model. If, during the simulation, the circuit does not behave as planned, the user re-defines the circuit by modifying the circuit schematic or the HDL file.

The use of this embodiment of the present invention is shown in the flow chart of FIG. 2. The algorithm starts at step 100. After loading the HDL file into the system, the system compiles, partitions, and maps the circuit design to appropriate hardware models. The compilation, partition, and mapping steps are discussed in more detail below.

Before the simulation runs, the system must run a reset sequence to remove all the unknown "x" values in software before the hardware acceleration model can function. One embodiment of the present invention uses a 2-bit wide data path to provide a 4-state value for the bus signal--"00" is logic low, "01" is logic high, "10" is "z," and "11" is "x." As known to those ordinarily skilled in the art, software models can deal with "0," "1," "x" (bus conflicts or unknown value), and "z" (no driver or high impedance). In contrast, hardware cannot deal with the unknown values "x," so the reset sequence, which varies depending on the particular applicable code, resets the register values to all "0" or all "1."

At step 105, the user decides whether to simulate the circuit design. Typically, a user will start the system with software simulation first. Thus, if the decision at step 105 resolves to "YES," software simulation occurs at step 110.

The user can stop the simulation to inspect values as shown in step 115. Indeed, the user can stop the simulation at any time during the test/debug session as shown by the dotted lines extending from step 115 to various nodes in the hardware acceleration mode, ICE mode, and post-simulation mode. Executing step 115 takes the user to step 160.

After stopping, the system kernel reads back the state of hardware register components to regenerate the entire software model, including the combinational components, if the user wants to inspect combinational component values. After restoring the entire software model, the user can inspect any signal value in the system. After stopping and inspection, the user can continue to run in simulation only mode or hardware model acceleration mode. As shown in the flow chart, step 115 branches to the stop/value inspect routine. The stop/value inspect routine starts at step 160. At step 165, the user must decide whether to stop the simulation at this point and inspect values. If step 165 resolves to "YES," step 170 stops the simulation that may be currently underway and inspects various values to check for correctness of the circuit design. At step 175, the algorithm returns to the point at which it branched, which is at step 115. Here, the user can continue to simulate and stop/inspect values for the remainder of the test/debug session or proceed forward to the in-circuit emulation step.

Similarly, if step 105 resolves to "NO," the algorithm will proceed to the hardware acceleration decision step 120. At step 120, the user decides whether to accelerate the test/debug process by accelerating the simulation through the hardware portion of the modeled circuit design. If the decision at step 120 resolves to "YES," then hardware model acceleration occurs at step 125. During the system compilation process, the SEmulation system mapped some portions into a hardware model. Here, when hardware acceleration is desired, the system moves register and combinational components into the hardware model and moves the input and evaluation values to the hardware model. Thus, during hardware acceleration, the evaluation occurs in the hardware model for a long time period at the accelerated speed. The kernel writes test-bench output to the hardware model, updates the software clock, then reads the hardware model output values cycle-by-cycle. If desired by the user, values from the entire software model of the user's circuit design, which is the entire circuit design, can be made available by outputting register values and combinational components by regenerating combinational components with the register values. Because of the need for software intervention to regenerate these combinational components, outputs of values for the entire software model are not provided at every cycle; rather, values are provided to the user only if the user wants such values. This specification will discuss the combinational component regeneration process later.

Again, the user can stop the hardware acceleration mode at any time as indicated by step 115. If the user wants to stop, the algorithm proceeds to steps 115 and 160 to branch to the stop/value inspect routine. Here, as in step 115, the user can stop the hardware accelerated simulation process at any time and inspect values resulting from the simulation process, or the user can continue with the hardware-accelerated simulation process. The stop/value inspect routine branches to steps 160, 165,
170, and 175, which were discussed above in the context of stopping the simulation. Returning to the main routine after step 125, the user can decide to continue with the hardware-accelerated simulation or perform pure simulation instead at step 135. If the user wants to simulate further, the algorithm proceeds to step 105. If not, the algorithm proceeds to the post-simulation analysis at step 140.

At step 140, the SEmulation system provides a number of post-simulation analysis features. The system logs all inputs to the hardware model. For hardware model outputs, the system logs all values of hardware register components at a user-defined logging frequency (e.g., 1/10,000 record/cycle). The logging frequency determines how often the output values are recorded. For a logging frequency of 1/10,000 record/cycle, output values are recorded once every 10,000 cycles. The higher the logging frequency, the more information is recorded for later post-simulation analysis. Because the selected logging frequency has a causal relationship to the SEmulation speed, the user selects the logging frequency with care. A higher logging frequency will decrease the SEmulation speed because the system must spend time and resources to record the output data by performing I/O operations to memory before further simulation can be performed.

With respect to the post-simulation analysis, the user selects a particular point at which simulation is desired. The user can then perform analysis after SEmulation by running the software simulation with input logs to the hardware model to compute the value changes and internal states of all hardware components. Note that the hardware accelerator is used to simulate the data from the selected logging point to analyze simulation results. This post-simulation analysis method can link to any simulation waveform viewer for post-simulation analysis. More detailed discussion will follow.

At step 145, the user can opt to emulate the simulated circuit design within its target system environment. If step 145 resolves to "NO," the algorithm ends and the SEmulation process ends at step 155. If emulation with the target system is desired, the algorithm proceeds to step 150. This step involves activating the emulation interface board, plugging the cable and chip pin adapter to the target system, and running the target system to obtain the system I/O from the target system. The system I/O from the target system includes signals between the target system and the emulation of the circuit design. The emulated circuit design receives input signals from the target system, processes these, sends them to the SEmulation system for further processing, and outputs the processed signals to the target system. Conversely, the emulated circuit design sends output signals to the target system, which processes these, and possibly outputs the processed signals back to the emulated circuit design. In this way, the performance of the circuit design can be evaluated in its natural target system environment. After the emulation with the target system, the user has results that validate the circuit design or reveal non-functional aspects. At this point, the user can simulate/emulate again as indicated at step 135, stop altogether to modify the circuit design, or proceed to integrated circuit fabrication based on the validated circuit design.

III. SIMULATION/HARDWARE ACCELERATION MODES

A high level diagram of the software compilation and hardware configuration during compile time and run time in accordance with one embodiment of the present invention is shown in FIG. 3. FIG. 3 shows two sets of information: one set of information distinguishes the operations performed during compile time and simulation/emulation run time; and the other set of information shows the partitioning between software models and hardware models. At the outset, the SEmulation system in accordance with one embodiment of the present invention needs the user circuit design as input data 200. The user circuit design is in some form of HDL file (e.g., Verilog, VHDL). The SEmulation system parses the HDL file so that behavior level code, register transfer level code, and gate level code can be reduced to a form usable by the SEmulation system. The system generates a source design database for front end processing step 205. The processed HDL file is now usable by the SEmulation system. The parsing process converts ASCII data to an internal binary data structure and is known to those ordinarily skilled in the art. Please refer to ALFRED V. AHO, RAVI SETHI, AND JEFFREY D. ULLMAN, COMPILERS: PRINCIPLES, TECHNIQUES, AND TOOLS (1988), which is incorporated by reference herein.

Compile time is represented by processes 225 and run time is represented by processes/elements 230. During compilation time as indicated by process 225, the SEmulation system compiles the processed HDL file by performing component type analysis. The component type analysis classifies HDL components into combinational components, register components, clock components, memory components, and test-bench components. Essentially, the system partitions the user circuit design into control and evaluation components.

The SEmulation compiler 210 essentially maps the control components of the simulation into software and the evaluation components into software and hardware. The compiler 210 generates a software model for all HDL components. The software model is cast in code 215. Additionally, the SEmulation compiler 210 uses the component type information of the HDL file, selects or generates hardware logic blocks/elements from a library or module generator, and generates a hardware model for certain HDL components. The end result is a so-called "bitstream" configuration file 220.

In preparation for run-time, the software model in code form is stored in main memory where the application program associated with the SEmulation program in accordance with one embodiment of the present invention is stored. This code is processed in the general purpose processor or workstation 240. Substantially concurrently, the configuration file 220 for the hardware model is used to map the user circuit design into the reconfigurable hardware boards 250. Here, those portions of the circuit design that have been modeled in hardware are mapped and partitioned into the FPGA chips in the reconfigurable hardware boards 250.

As explained above, user test-bench stimulus and test vector data as well as other test-bench resources 235 are applied to the general purpose processor or workstation 240 for simulation purposes. Furthermore, the user can perform emulation of the circuit design via software control. The reconfigurable hardware boards 250 contain the user's emulated circuit design. This SEmulation system has the ability to let the user selectively switch between software simulation and hardware emulation, as well as stop either the simulation or emulation process at any time, cycle-by-cycle, to inspect values from every component in the model, whether register or combinational. Thus, the SEmulation system passes data between the test-bench 235 and the processor/workstation 240 for simulation and the test-bench 235 and the reconfigurable hardware boards 250 via data bus 245 and processor/workstation 240 for emulation. If a user target system 260 is involved, emulation data can pass between the reconfigurable hardware boards 250 and the target system 260 via the emulation interface 255 and data bus 245. The kernel is found in the software simulation model in the memory of the processor/workstation 240 so data necessarily pass between the processor/workstation 240 and the reconfigurable hardware boards 250 via data bus 245.

FIG. 4 shows a flow chart of the compilation process in accordance with one embodiment of the present invention. The compilation process is represented as processes 205 and 210 in FIG. 3. The compilation process in FIG. 4 starts at step 300. Step 301 processes the front end information. Here, gate level HDL code is generated. The user has converted the initial circuit design into HDL form by directly handwriting the code or using some form of schematic or synthesis tool to generate the gate level HDL representations of the code. The SEmulation system parses the HDL file (in ASCII format) into a binary format so that behavior level code, register transfer level (RTL) code, and gate level code can be reduced to an internal data structure form usable by the SEmulation system. The system generates a source design database containing the parsed HDL code.

Step 302 performs component type analysis by classifying HDL components into combinational components, register components, clock components, memory components, and test-bench components as shown in component type resource 303. The SEmulation system generates hardware models for register and combinational components, with some exceptions as discussed below. Test-bench and memory components are mapped in software. Some clock components (e.g., derived clocks) are modeled in hardware and others reside in the software/hardware boundary (e.g., software clocks).

Combinational components are stateless logic components whose output values are a function of current input values and do not depend on the history of input values. Examples of combinational components include primitive gates (e.g., AND, OR, XOR, NOT), selector, adder, multiplier, shifter, and bus drivers.

Register components are simple storage components. The state transition of a register is controlled by a clock signal. One form of register is edge-triggered which may change states when an edge is detected. Another form of register is a latch, which is level triggered. Examples include flip-flops (D-type, JK-type) and level-sensitive latches.

Clock components are components that deliver periodic signals to logic devices to control their behavior. Typically, clock signals control the update of registers. Primary clocks are generated from self-timed test-bench processes. For example, a typical test-bench process for clock generation in Verilog is as follows:

always begin

Clock=0;

#5;

Clock=1;

#5;

end;

According to this code, the clock signal is initially at logic "0." After 5 time units, the clock signal changes to logic "1." After 5 time units, the clock signal reverts back to logic "0." Usually, the primary clock signals are generated in software and only a few (i.e., 1-10) primary clocks are found in a typical user circuit design. Derived or gated clocks are generated from a network of combinational logic and registers that are in turn driven by the primary clocks. Many (i.e., 1,000
or more) derived clocks are found in a typical user circuit design.

Memory components are block storage components with address and control lines to access individual data in specific memory locations. Examples include ROM, asynchronous RAM, and synchronous RAM.

Test-bench components are software processes used to control and monitor the simulation processes. Accordingly, these components are not part of the hardware circuit design under test. Test-bench components control the simulation by generating clock signals, initializing simulation data, and reading simulation test vector patterns from disk/memory. Test-bench components also monitor the simulation by checking for changes in value, performing value change dump, checking asserted constraints on signal value relations, writing output test vectors to disk/memory, and interfacing with various waveform viewers and debuggers.

The SEmulation system performs component type analysis as follows. The system examines the binary source design database. Based on the source design database, the system can characterize or classify the elements as one of the above component types. Continuous assignment statements are classified as combinational components. Gate primitives are either combinational type or latch form of register type by language definition. Initialization code are treated as test-benches of initialization type.

An always process that drives nets without using the nets is a test-bench of driver type. An always process that reads nets without driving the nets is a test-bench of monitor type. An always process with delay controls or multiple event controls are test-benches of general type.

An always process with a single event control and driving a single net can be one of the following: (1) If the event control is edge-triggered event, then the process is an edge-triggered type register component. (2) If a net driven in a process is not defined in all possible execution paths, then the net is a latch type of register. (3) If a net driven in a process is defined in all possible execution paths, then the net is a combinational component.

An always process with a single event control but driving multiple nets can be decomposed into several processes driving each net separately to derive their respective component types separately. The decomposed processes can then be used to determine component type.

Step 304 generates a software model for all HDL components, regardless of component type. With the appropriate user interface, the user is capable of simulating the entire circuit design using the complete software model. Test-bench processes are used to drive the stimulus input, test vector patterns, control the overall simulation, and monitor the simulation process.

Step 305 performs clock analysis. The clock analysis includes two general steps: (1) clock extraction and sequential mapping, and (2) clock network analysis. The clock extraction and sequential mapping step includes mapping the user's register components into the SEmulation system's hardware register model and then extracting clock signals out of the system's hardware register components. The clock network analysis step includes determining primary clocks and derived clocks based on the extracted clock signals, and separating the gated clock network and gated data network. A more detailed description will be provided with respect to FIG. 16.

Step 306 performs residence selection. The system, in conjunction with the user, selects the components for hardware models; that is, of the universe of possible hardware components that can be implemented in the hardware model of the user's circuit design, some hardware components will not be modeled in hardware for a variety of reasons. These reasons include component types, hardware resource constraints (i.e., floating point operations and large multiply operations stay in software), simulation and communication overhead (i.e., small bridge logic between test-bench processes stay in software, and signals that are monitored by test-bench processes stay in software), and user preferences. For a variety of reasons including performance and simulation monitoring, the user can force certain components that would otherwise be modeled in hardware to stay in software.

Step 307 maps the selected hardware models into a reconfigurable hardware emulation board. In particular, step 307 maps takes the netlist and maps the circuit design into specific FPGA chips. This step involves grouping or clustering logic elements together. The system then assigns each group to a unique FPGA chip or several groups to a single FPGA chip. The system may also split groups to assign them to different FPGA chips. In general, the system assigns groups to FPGA chips. More detailed discussion will be provided below with respect to FIG. 6. The system places the hardware model components into a mesh of FPGA chips to minimize inter-chip communication overhead. In one embodiment, the array comprises a 4.times.4 array of FPGAs, a PCI interface