Home
Patent Search
IMT Blog
REGISTER
|
SIGN IN
United States Patent
5812414
Butts , ; et al.
September 22, 1998
Title
Method for performing simulation using a hardware logic emulation system
Abstract
A plurality of electronically reconfigurable gate array (ERCGA) logic chips are interconnected via a reconfigurable interconnect, and electronic representations of large digital networks are converted to take temporary actual operating hardware form on the interconnected chips. The reconfigurable interconnect permits the digital network realized on the interconnected chips to be changed at will, making the system well suited for a variety of purposes including simulation, prototyping, execution and computing. The reconfigurable interconnect may comprise a partial crossbar that is formed of ERCGA chips dedicated to interconnection functions, wherein each such interconnect ERCGA is connected to at least one, but not all of the pins of a plurality of the logic chips. Other reconfigurable interconnect topologies are also detailed.
Inventors:
Butts; Michael R.
(Portland,
OR
)
, Batcheller; Jon A.
(Newberg,
OR
)
Assignee:
Quickturn Design Systems, Inc.
(Mountain View,
CA
)
Appl. No.:
770655
Filed:
December 19, 1996
Current U.S. Class:
716/16
703/13
Current International Class:
G06F 17/50 (20060101)
Field of Search:
364/488,489,490,578 395/500
U.S. Patent Documents
3106698
October 1963
Unger
3287702
November 1966
Borck, Jr. et al.
3287703
November 1966
Slotnick
3473160
October 1969
Wahlstrom
4020469
April 1977
Manning
4306286
December 1981
Cocke et al.
4386403
May 1983
Hsieh et al.
4488354
December 1984
Chan et al.
4503386
March 1985
DasGupta et al.
4541071
September 1985
Ohmori
4577276
March 1986
Dunlop et al.
4578761
March 1986
Gray
4593363
June 1986
Burstein et al.
4612618
September 1986
Pryor et al.
4621339
November 1986
Wagner et al.
4642487
February 1987
Carter
4656580
April 1987
Hitchock, Sr. et al.
4656592
April 1987
Spaanenburg et al.
4675832
June 1987
Robinson et al.
4682440
July 1987
Nomizu et al.
4695999
September 1987
Lebizay
4697241
September 1987
Lavi
4700187
October 1987
Furtek
4706216
November 1987
Carter
4722084
January 1988
Morton
4736338
April 1988
Saxe et al.
4740919
April 1988
Elmer
4744084
May 1988
Beck et al.
4747102
May 1988
Funatsu
4752887
June 1988
Kuwahara
4758985
July 1988
Carter
4768196
August 1988
Jou et al.
4777606
October 1988
Fournier
4786904
November 1988
Graham, III et al.
4787061
November 1988
Nei et al.
4791602
December 1988
Resnick
4803636
February 1989
Nishiyama et al.
4811214
March 1989
Nosenchuck et al.
4815003
March 1989
Patatunda et al.
4823276
April 1989
Hiwatashi
4827427
May 1989
Hyduke
4835705
May 1989
Fujino et al.
4849904
July 1989
Aipperspach et al.
4849928
July 1989
Hauck
4862347
August 1989
Rudy
4870302
September 1989
Freeman
4872125
October 1989
Catlin
4876466
October 1989
Kondou et al.
4882690
November 1989
Shinsha et al.
4901259
February 1990
Watkins
4901260
February 1990
Lubachevsky
4908772
March 1990
Chi
4914612
April 1990
Beece et al.
4918440
April 1990
Furtek
4918594
April 1990
Onizuka
4922432
May 1990
Kobayashi et al.
4924429
May 1990
Kurashita et al.
4931946
June 1990
Ravindra et al.
4935734
June 1990
Austin
4942536
July 1990
Watanabe et al.
4942615
July 1990
Hirose
4945503
July 1990
Takasaki
4949275
August 1990
Nonaka
4951220
August 1990
Ramacher et al.
4965739
October 1990
Ng
5003487
March 1991
Drumm et al.
5023775
June 1991
Poret
5041986
August 1991
Tanishita
5046017
September 1991
Yuyama et al.
5051938
September 1991
Hyduke
5053980
October 1991
Kanazawa
5081602
January 1992
Glover
5084824
January 1992
Lam et al.
5093920
March 1992
Agrawal et al.
5109353
April 1992
Sample et al.
5126966
June 1992
Hafeman et al.
5128871
July 1992
Schmitz
5140526
August 1992
McDermith et al.
5146460
September 1992
Ackerman et al.
5189628
February 1993
Olsen et al.
5193068
March 1993
Britman
5197016
March 1993
Sugimoto et al.
5224056
June 1993
Chene et al.
5231588
July 1993
Agrawal et al.
5231589
July 1993
Itoh et al.
5233539
August 1993
Agrawal et al.
5253181
October 1993
Marui et al.
5258932
November 1993
Matsuzaki
5260881
November 1993
Agrawal et al.
5263149
November 1993
Winlow
5272651
December 1993
Bush et al.
5343406
August 1994
Freeman et al.
5377124
December 1994
Mohsen
5425036
June 1995
Liu et al.
5452227
September 1995
Kelsey et al.
5467462
November 1995
Fujii
Foreign Patent Documents
01154251
Jun., 1989
JP
2 180 382
Mar., 1987
GB
2180382
Mar., 1987
GB
58-205870
Nov., 1983
JP
59-161839
Sep., 1984
JP
Other References
"The Homogenous Computational Medium; New Technology For Computation", Concurrent Logic Inc., Jan. 26, 1987. .
Spandorfer, "Synthesis of Logic Functions on an Array of Integrated Circuits", Contract Report AFCRI-6-6-298, Oct. 31, 1965. .
Tham, "Parallel Processing CAD Applications", IEEE Design & Test of Computer, Oct. 1987, pp. 13-17. .
Agrawal, et al. "MARS: A Multiprocessor-Based Programmable Accelerator", IEEE Design & Test of Computers, Oct. 1987, pp. 28-36. .
Manning, "An Approach to Highly Integratred, Computer-Maintained Cellular Arrays", IEEE Transactions on Computers, vol. C-26, Jun. 1977, pp. 536-552. .
Manning, "Automatic Test, Configuration, and Repair of Cellular Arrays", Doctoral Thesis MAC TR-151 (MIT), Jun. 1975. .
Shoup, "Programmable Cellular Logic Arrays," Doctoral Thesis (Carnegie-Mellon University; DARPA contract No. F44620-67-C-0058), Mar. 1970. .
Shoup, "Programmable Cellular Logic," undated, pp. 27-28. .
Wynn, "In-Circuit Emulation for ASIC-Based Designs" VLSI Systems Design, Oct. 1986, pp. 38-45. .
Minnick, "Survey of Microcellular Research," Stanford Research Institute Project 5876 (Contract AF 19(628)-5828), Jul. 1966. .
Minnick, "A Programmable Cellular Array," undated, pp. 25-26. .
Minnick, "Cutpoint Cellular Logic," IEEE Transactions on Electronic Computers, Dec. 1964, pp. 685-698. .
Jump, et al. "Microprogrammed Arrays," IEEE Transactions on Computers, vol. C-21, No. 9, Sep. 1972, pp. 974-984. .
Gentile, et al. "Design of Switches for Self-Roconfiguring VLSI Array Structures," Microprocessing and Microprogramming, North-Holland, 1984, pp. 99-108. .
Sami, et al. "Reconfigurable Architectures for VLSI Processing Arrays," AFIPS Conference Proceedings, 1983 National Computer Conference, May 16-19, 1983, pp. 565-577. .
Beece et al., "The IBM Engineering Verification Engine," 25th ACM/IEEE Design Automation Conference, Paper 17.1, 1988, pp. 218-224. .
Pfister, "The Yorktown Simulation Engine: Introduction," 19th Design Automation conference, Paper 7.1 1982, pp. 51-54. .
Denneau, "The Yorktown Simulation Engine," 19th Design Automation conference, Paper 7.2, 1982, pp. 55-59. .
Kronstadt, et al., "Software Support for the Yorktown Simulation Engine," 19th Design Automation conference, Paper 7.3, 1982, pp. 60-64. .
Koike, et al., "HAL: A High-Speed Logic Simulation Machine," IEEE Design & Test, Oct. 1985, pp. 61-73. .
Shear, "Tools help you retain the advantages of using breadboards in gate-array design," EDN, Mar. 18, 1987, pp. 81-88. .
McClure, "PLD Broadboarding of Gate Array Designs," VLSI Systems Design, Feb. 1987, pp. 36-41. .
Anderson, "Restructurable VLSI Program" Report No. ESD-TR-192 (DARPA Contract No. F19628-80-C-0002), Mar. 31, 1980. .
Xilinx, First Edition, "The Programmable Gate Array Design Handbook," 1986, pp. 1-1 to 4-33. .
Odawara, "Partitioning and Placement Technique for CMOS Gate Arays," IEEE Transactions on Computer Aided Design, May 1987, pp. 355-363. .
Beresford, "An Emulator for CMOS Asics," VLSI Systems Design, May 4, 1987, p. 8. .
Wynn, "Designing with Logic Cell Arrays," Electro/87 and Mini/Micro Northeast Conference Record, 1987. .
Malik, Sharad, et al., "Combining Multi-Level Decomposition and Topological Partitioning for PLAS," IEEE 1987, pp. 112-115. .
Bradsma, et al. "The Hardware Simulator: A Tool for Evaluating Computer Systems," IEEE Transactions on Computers, Jan. 1977, pp. 68-72. .
Horstmann, "Macro Test Circuit Generation," IBM TDM vol. 18, No. 12, May 1976 pp. 4023-4029. .
IBM TDM, "Testing Multiple Discrete Software Components by Connecting Real and Simulated Hardware Components," vol. 30, No. 4, Sep., 1987, pp. 1844-1845. .
Mentor Graphics Corp., "Gate Station User's Manual," 1987, (excerpts). .
Mentor Graphics Corp., "Technology Definition Format Reference Manual," 1987, (excerpts). .
Chen, "Fault-Tolerant Wafer Scale Architectures Using Large Crossbar Switch Arrays," excerpt from Jesshope, et al., Wafer Scale Integration, A.Hilger, 1986, pp. 113-124. .
Kung, "Why Systolic Architectures?," Computer, Jan. 1982, pp. 37-46. .
Hedlund, "Wafer Scale Integration of Parallel Processors," Doctoral Thesis (Purdue University; Office of Naval Research Contracts N00014-80-K-0816 and N00014-81-K-0360) 1982. .
Hedlund et al., "Systolic Architectures-A Wafer Scale Approach," IEEE, 1984, pp. 604-610. .
Choi et al., "Fault Diagnosis of Switches in Wafer-Scale Arrays," AIEEE, 1986, pp. 292-295. .
Fiduccia, et al. "A Linear-Time Heuristic For Improving Network Partitions," IEEE Design Automation Conference, 1982, pp. 175-181. .
Trickey, "Flamel: A High-Level Hardware Compiler," IEEE Transactions on Computer-Aided Design, Mar., 1987, pp. 259-269. .
Schweikert, "A Proper Model for the Partitioning of Electrical Circuits," Bell Telephone Laboratories, Inc. Murray Hill, N.J., pp. 57-62. .
"Partitioning of PLA Logic," IBM TDM, vol. 28, No. 6, Nov. 1985, pp. 2332-2333. .
Goossens, et al., "A Computer-Aided Design Methodolgy for Mapping DSP-Algorithms Onto Custom Multi-Processor Architectures," IEEE 1986, pp. 924-925. .
Runner, "Synthesizing Ada's Ideal Machine Mate," VLSI Systems Design, Oct., 1988, pp. 30-39. .
Wagner, "The Boolean Vector Machine," ACM Sigarch, 1983, pp. 59-66. .
Peparata, "The Cube-Connected Cycles: A Versatile Network for Parallel Computation," Communications of the ACM, May, 1981, pp. 300-309. .
Clos, "A Study of Non-Blocking Switching Networks," The Bell System Technical Journal, Mar. 1953, pp. 406-424. .
Masson, "A Sampler of Circuit Switching Networks" Computer, Jun. 1979, pp. 32-48. .
"Plus Logic FPGA2020 Field Programmable Gate Array" Brochure by Plus Logic, San Jose, CA, pp. 1-13. .
Wirbel, "Plus Logic Rethinks PLD Approach," newspaper article, not dated, one page. .
Schmitz, "Emulation of VLSI Devices using LCAs," VLSI Systems Design, May 20, 1987, pp. 54-62. .
Abramovici, et al., "A Logic Stimulation Machine," 19th Design Automation Conference, Paper 7.4, 1982, pp. 65-73. .
Hennessy, "Partitioning Programmable Logic Arrays," undated, pp. 180-181. .
DeMicheli, et al., "Topological Partitioning of Programmable Logic Arrays," undated, pp. 182-183. .
Munoz, et al., "Automatic Partitioning of Programmable Logic Devices," VLSI Systems Design, Oct. 1987, pp. 74-86. .
Feng, "A Survey of Interconnection Networks," Computer, Dec. 1981, pp. 12-27. .
Chapter 36, "Switching Networks and Traffic Concepts," Reference Data for Radio Engineers, Howard W. Sams & Co., 1981, pp. 36-1 to 36-16. .
Hou, et al., "A High Level Synthesis Tool for Systolic Designs," IEEE, 1988, pp. 665-673. .
"Gate Station Reference Manual," Mentor Graphics Corp., 1987 (excerpts). .
Dussault, et al., "A High Level Synthesis Tool for MOS Chip Design," 21st Design Automation conference, 1984, IEEE, pp. 308-314. .
DeMicheli, et al., "Hercules--A System for High Level Synthesis," 25th ACM/IEEE Design Automation Conference, 1988, pp. 483-488. .
McCarthy, "Partitioning Adapts Large State Machines to PLDs," EDN, Sep. 17, 1987, pp. 163-166. .
Donnell, "Crosspoint Switch: A PLD Approach," Digital Design, Jul. 1986 pp. 40-44. .
Beresford, "Hard Facts, Soft ASICS," VLSI Systems Design, Dec. 1986, p. 8. .
"ERA60100 Electrically Reconfigurable Array-ERA," Brochure by Plessey Semiconductors, Apr. 1989. .
Snyder, "Introduction to the Configurable, Highly Parallel Computer," Report CSD-TR-351, Office of Naval Research Contracts N00014-80-K-0816 and N00014-8-1-K-0360, Nov. 1980. .
Palesko, et al., "Logic Partitioning for Minimizing Gate Arrays," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. CAD-2, No. 2, Apr. 1983. .
T.Payne; Automated Partitioning of Hierarchically Specified Digital Systems; May 1981. .
Chin, et al. A dynamically Reconfigurable Interconnect Chip; IEEE International Solid State Circuits Conference, 1987; pp. 276-277 & 425. .
"Emulation of VLSI Devices Using LCAs" by N. Schmitz, VLSI Systems Designs, May 20, 1987 pp. 54-62. .
"Silcon Compilation a Hierarchical Use of PLAs" by R. Ayres, Serox Corporation, pp. 314-326..~
Primary Examiner:
Trans; Vincent N.
Attorney, Agent or Firm:
Lyon & Lyon LLP
Parent Case Text
RELATED APPLICATION DATA
This is a continuation of application Ser. No. 08/470,185, filed on Jun. 6, 1995, now abandoned, which is a division of application Ser. No. 08/245,310, filed on May 17, 1994, now U.S. Pat. No. 5,452,231, which is a continuation of application Ser. No. 07/923,361, filed on Jul. 31, 1992, now abandoned, which is a division of application Ser. No. 07/698,734, filed on May 10, 1991, now abandoned, which is a continuation-in-part of application Ser. No. 07/417,196, filed on Oct. 4, 1989, now U.S. Pat. No. 5,036,473, which is a continuation-in-part of application Ser. No. 07/254,463, filed on Oct. 5, 1988, now abandoned. Prosecution of the latter application continued in application Ser. No. 07/424,075, now abandoned.
Claims
We claim:
1. A method for stimulating a functional circuit with logical stimulus to determine what response is produced by the functional circuit from that input, the method comprising the steps:
(a) configuring a reconfigurable logic apparatus to implement the functional circuit, said reconfigurable logic apparatus comprising N reprogrammable logic devices, where N is a number greater than one, said N reprogrammable logic devices interconnected by reprogrammable interconnect devices, said functional circuit being implemented by at least two of said N reprogrammable logic devices;
(b) converting the logical stimulus into input electrical signals;
(c) inputting said electrical signals to said N reconfigurable logic apparatus which is configured with the functional circuit;
(d) receiving output electrical signals from said reconfigurable logic apparatus; and
(e) converting said output electrical signals into software form.
2. The method of claim 1 wherein each step is performed in seriatim from step (a) to step (e).
3. The method of claim 1 wherein each of said N reprogrammable logic devices comprises a field programmable gate array.
4. The method of claim 1 wherein said software form is translatable by a computer into a user-readable format.
5. A method of simulating a functional circuit design, the method comprising the steps of:
(a) configuring a reconfigurable logic apparatus to implement the functional circuit, said reconfigurable logic apparatus comprising N reprogrammable logic devices, where N is a number greater than one, said N reprogrammable logic devices interconnected by reprogrammable interconnect devices, said functional circuit being implemented by at least two of said N reprogrammable logic devices;
(b) providing a set of stimulus to said reconfigurable logic apparatus having the functional circuit configured therein;
(c) collecting a set of responses to said set of stimulus from said reconfigurable logic apparatus having the functional circuit configured therein; and
(d) converting said set of responses from a machine readable form to a user-readable form.
6. The method of claim 5 wherein said providing step comprises the steps of:
(a) converting said set of stimulus from a netlist format into a set of input vectors, said input vectors comprising binary data capable of loading into said N reprogrammable logic devices; and
(b) loading said binary data into said N reprogrammable logic devices.
7. The method of claim 5 wherein each step is performed in seriatim from step (a) to step (d).
8. The method of claim 5 wherein each of said N reprogrammable logic devices comprises a field programmable gate array.
Description
FIELD OF THE INVENTION
The present invention relates to reconfigurable hardware simulators (more precisely here termed "emulators") which employ electronically reconfigurable gate array logic elements (ERCGAs). The claimed invention more particularly relates to hybrid simulation methods and apparatuses wherein such a reconfigurable hardware emulator is used in conjunction with a second simulator, such as an event driven simulator, to permit fast and detailed analysis of a logic circuit's operation.
BACKGROUND AND SUMMARY OF THE INVENTION
For expository convenience, the present application refers to the present invention as a Realizer.TM. system, the lexicon being devoid of a succinct descriptive name for a system of the type hereinafter described.
The Realizer system comprises hardware and software that turns representations of large digital logic networks into temporary actual operating hardware form, for the purpose of simulation, prototyping, execution or computing. (A digital logic network is considered "large" when it is contains too many logic functions to be contained in a few of the largest available configurable logic devices.)
The following discussions will be made clearer by a brief review the relevant terminology as it is typically (but not exclusively) used.
To "realize" something is to make it real or actual. To realize all or part of a digital logic network or design is to cause it to take actual operating form without building it permanently.
An "input design" is the representation of the digital logic network which is to be realized. It contains primitives representing combinational logic and storage, as well as instrumentation devices or user-supplied actual devices, and nets representing connections among primitive input and output pins.
To "configure" a logic chip or interconnect chip is to cause its internal logic functions and/or interconnections to be arranged in a particular way. To configure a Realizer system for an input design is to cause its internal logic functions and interconnections to be arranged according to the input design.
To "convert" a design is to convert its representation into a file of configuration data, which, when used directly to configure Realizer hardware, will cause the design to be realized.
To "operate" a design is to cause Realizer hardware, which is configured according to the input design's representations, to actually operate.
An "interconnect" is a reconfigurable means for passing logic signals between a large number of chip I/O pins as if the pins were interconnected with wires.
A "path" is one of the built-in interconnection wires between a logic chip and a crossbar chip in a partial crossbar interconnect, or between crossbar chips in a hierarchy of partial crossbars.
A "path number" specifies a particular path, out of the many that may interconnect a pair of chips.
An "ERCGA" is an electronically reconfigurable gate array, that is a collection of combinational logic, and input/output connections (and optionally storage) whose functions and interconnections can be configured and reconfigured many times over, purely by applying electronic signals.
A "logic chip" is an ERCGA used to realize the combinational logic, storage and interconnections of an input design in the Realizer system.
An "Lchip" is a logic chip, or a memory module or user-supplied device module which is installed in place of a logic chip.
An "interconnect chip" is an electronically reconfigurable device which can implement arbitrary interconnections among its I/O pins.
A "routing chip" is an interconnect chip used in a direct or channel-routing interconnect.
A "crossbar chip" is an interconnect chip used in a crossbar or partial crossbar interconnect.
An "Xchip" is a crossbar chip in the partial crossbar which interconnects Lchips. A "Ychip" is a crossbar chip in the second level of a hierarchical partial crossbar interconnect, which interconnects Xchips. A "Zchip" is a crossbar chip in the third level of a hierarchical partial crossbar interconnect, which interconnects Ychips.
A "logic board" is a printed circuit board carrying logic and interconnect chips. A "box" is a physical enclosure, such as a cardcage, containing one or more logic boards. A "rack" is a physical enclosure containing one or more boxes.
A "system-level interconnect" is one which interconnects devices larger than individual chips, such as logic boards, boxes, racks and so forth.
A "Logic Cell Array" or "LCA" is a particular example of ERCGA which is manufactured by Xilinx, Inc., and others and is used in the preferred embodiment.
A "configurable logic block" or "CLB" is a small block of configurable logic and flip-flops, which represent the combinational logic and storage in an LCA.
A "design memory" is a memory device which realizes a memory function specified in the input design.
A "vector memory" is a memory device used to provide a large body of stimulus signals to and/or collect a large body of response signals from a realized design in the Realizer system.
A "stimulator" is a device in the Realizer system used to provide stimulus signals to an individual input of a realized design. A "sampler" is a device in the Realizer system used to collect response signals from an individual output of a realized design.
A "host computer" is a conventional computer system to which the Realizer system's host interface hardware is connected, and which controls the configuration and operation of the Realizer hardware.
An "EDA system" is a electronic design automation system, that is a system of computer-based tools used for creating, editing and analyzing electronic designs. The host EDA system is the one which generates the input design file in most Realizer system applications.
If a reconfigurable gate array with enough capacity to hold a single large design were available, then much of the Realizer technology would be unnecessary. However, this will never be the case, for two reasons.
First, ERCGAs cannot have as much logic capacity as a non-reconfigurable integrated circuit of the same physical size made with the same fabrication technology. The facilities for reconfigurability take up substantial space on the chip. An ERCGA must have switching transistors to direct signals and storage transistors to control those switches, where a non-reconfigurable chip just has a metal trace, and can put those transistors to use as logic. The regularity required for a reconfigurable chip also means that some resources will go unused in real designs, since placement and routing of regular logic structures are never able to use 100% of the available gates. These factors combine to make ERCGAs have about one tenth the logic capacity of non-reconfigurable chips. In actual current practice, the highest gate capacity claimed for an ERCGA is 9,000 gates (Xilinx XC3090). Actual semi-custom integrated circuits fabricated with similar technology offer over 100,000 gate logic capacity (Motorola).
Second, it is well known that real digital systems are built with many integrated circuits, typically ten to one hundred or more, often on many printed circuit boards. If an ERCGA did have as much logic capacity as the largest integrated circuit, it would still take many such chips to realize most digital systems. Since it does not, still more are required.
Consequently, for a Realizer system to have the logic capacity of even a single large-scale chip, it should have many ERCGAs, on the order of ten. To have the capacity for a system of such chips, on the order of hundreds of ERCGAs are required. Note that this is true regardless of the specific fabrication capabilities. If a fabrication process can double the capacity of ERCGAs by doubling the number of transistors per chip, then non-reconfigurable chip capacities and therefore overall design sizes will double, as well.
For these reasons, to build a useful Realizer system, it is necessary to be able to interconnect hundreds of ERCGAs in an electronically reconfigurable way, and to convert designs into configurations for hundreds of ERCGAs. This invention does not cover the technology of any ERCGA itself, only the techniques for building a Realizer system out of many ERCGAs.
ERCGA technology does not show how to build a Realizer system, because the problems are different. ERCGA technology for reconfigurably interconnecting logic elements which are all part of one IC chip does not apply to interconnecting many. ERCGA interconnections are made simply by switching transistors that pass signals in either direction. Since there are no barriers across one chip, there are a large number of paths available for interconnections to take. Since the chip is small, signal delays are small. Interconnecting many ERCGAs is a different problem, because IC package pins and printed circuit boards are involved. The limited number of pins available means a limited number of paths for interconnections. Sending signals onto and off of chips must be done through active (i.e. amplifying) pin buffers, which can only send signals in one direction. These buffers and the circuit board traces add delays which are an order of magnitude greater than the on-chip delays. The Realizer system's interconnection technology solves these problems in a very different way than the ERCGA.
Finally, the need to convert a design into configurations for many chips is not addressed by ERCGA technology. The Realizer system's interconnect is entirely different than that inside an ERCGA, and an entirely different method of determining and configuring the interconnect is required.
ERCGAs are made with the fastest and densest silicon technology available at any given time. (1989 Xilinx XC3000 LCAs are made in 1 micron SRAM technology.) That is the same technology as the fastest and densest systems to be realized. Because ERCGAs are general and have reconfigurable interconnections, they will always be a certain factor less dense than contemporary gate arrays and custom chips. Realizer systems repeat the support for generality and reconfigurability above the ERCGA level. Therefore, a Realizer system is always a certain factor, roughly one order of magnitude, less dense than the densest contemporary systems. Board-level Realizer systems realize gate arrays, box-level Realizer systems realize boards and large custom chips, and rack-level Realizer systems realize boxes.
Design architectures are strongly affected by the realities of packaging. I/O pin width: at the VLSI chip level, 100 I/O pins is easily built, 200 pins are harder but not uncommon, and 400 pins is almost unheard of. At the board level, these figures roughly double. Logic densities: boards often accommodate 5 VLSI chips, 10 is possible, and 20 is unusual, simply because practical boards are limited to about 200 square inches maximum. Boxes accommodate 10 to 20 boards, rarely 40. Interconnect densities: modules may be richly interconnected on chips and boards, as several planes of two-dimensional wiring are available, but less so at the box level and above, as backplanes are essentially one-dimensional.
These packaging restrictions have a strong effect on system architectures that should be observed in effective Realizer systems. Because of the lower density in a Realizer system, a single logic chip will usually be realizing only a module in the realized design. A one-board logic chip complex will be realizing a VLSI chip or two, a box of Realizer boards will realize a single board in the design, and a rack of boxes will realize the design's box of boards.
Thus, a Realizer system's board-level logic and interconnect complex needs to have as much logic and interconnect capacity and I/O pin width as the design's VLSI chip. The Realizer system's box needs as much as the design's board, and the Realizer system's rack needs as much as the design's box.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a schematic block diagram of a Realizer hardware system.
FIG. 2 is a schematic block diagram of a direct interconnect system.
FIG. 3 is a schematic block diagram of channel-routing interconnect system.
FIGS. 4 and 4A are schematic block diagrams of a crossbar interconnect system.
FIGS. 5 and 5A are schematic block diagrams of a crossbar-net interconnect system.
FIG. 6 is a schematic block diagram of a simple specific example of a partial crossbar interconnect system.
FIG. 7 is a schematic block diagram of a partial crossbar interconnect system.
FIGS. 8a and 8b illustrate a difference in crossbar chip width.
FIG. 9 is a schematic block diagram of a tri-state net.
FIG. 10 is a schematic block diagram of a sum-of-products equivalent to the tri-state net of FIG. 9.
FIGS. 11a and 11b are schematic block diagrams of "floating low" and "floating high" sum of products networks.
FIG. 12 is a schematic block diagram of drivers and receivers collected to minimize interconnect.
FIG. 13 is a schematic block diagram of a logic summing configuration.
FIG. 14 is a schematic block diagram of a crossbar summing configuration.
FIG. 15 is a schematic block diagram of a bidirectional crossbar summing configuration.
FIG. 16 is a schematic block diagram of a bidirectional crossbar tri-state configuration.
FIG. 17 is a schematic block diagram showing off-board connections from partial crossbar.
FIG. 18 is a schematic block diagram of Y-level partial crossbar interconnect.
FIG. 19 is a schematic block diagram of a bidirectional bus system-level interconnect.
FIG. 20 is a schematic block diagram showing eight boards on a common bus interconnect.
FIG. 21 is a schematic block diagram showing the hierarchy of two bus levels.
FIG. 22 is a schematic block diagram showing a maximum bus interconnect hierarchy.
FIG. 23 is a schematic block diagram of a general memory module architecture.
FIG. 24 is a schematic block diagram of a memory address logic chip.
FIG. 25 is a schematic block diagram of a memory data logic chip using common I/O.
FIG. 26 is a schematic block diagram of a memory data logic chip using separate I/O.
FIG. 27 is a schematic block diagram showing multiple RAMs on one data bit.
FIG. 28 is a schematic block diagram of a preferred embodiment of a memory module.
FIG. 29 is a schematic block diagram of a stimulus vector memory.
FIG. 30 is a schematic block diagram of a response vector memory.
FIG. 31 is a schematic block diagram of a vector memory for stimulus and response.
FIG. 32 is a schematic block diagram of a preferred embodiment of a vector memory address chip.
FIG. 33 is a schematic block diagram of a preferred embodiment of a vector memory data chip.
FIG. 34 is a schematic block diagram of random-access stimulators.
FIG. 35 is a schematic block diagram of edge-sensitive stimulators.
FIG. 36 is a schematic block diagram of samplers.
FIG. 37 is a schematic block diagram of change-detecting samplers.
FIG. 38 is a schematic block diagram of a user-supplied device module architecture.
FIG. 39 is a schematic block diagram of a preferred embodiment of a USDM with devices installed.
FIG. 40 is a schematic block diagram of a configuration group.
FIG. 41 is a schematic block diagram of a host interface architecture.
FIG. 42 illustrates RBus read and write cycles.
FIG. 43 is a schematic block diagram of a Realier design conversion system.
FIGS. 44a and 44b illustrate design data structure used in the present invention.
FIGS. 45a, 45b and 45c illustrate primitive conversion used in the present invention.
FIG. 46 illustrates moving a primitive into a cluster.
FIGS. 47a, 47b and 47c illustrate a simple net interconnection.
FIGS. 48a , 48b and 48c illustrate tri-state net interconnection.
FIG. 49 is a schematic block diagram of a Realizer logic simulation system.
FIGS. 50a-c schematically illustrate Realizer system configuration of multi-site logic.
FIGS. 51a-b schematically illustrate a delay-dependent functionality example.
FIGS. 52a-c schematically illustrate a unit delay configuration example.
FIGS. 53a-c schematically illustrate a real delay configuration.
FIG. 54 is a schematic block diagram of a Realizer fault simulation system.
FIG. 55 is a schematic block diagram of a Realizer logic simulator evaluation system.
FIG. 56 is a schematic block diagram of a Realizer prototyping system.
FIG. 57 illustrates a digital computer example on a Realizer prototyping system.
FIG. 58 is a schematic block diagram of a virtual logic analyzer configuration.
FIG. 59 is a schematic block diagram of a Realizer production system.
FIG. 60 is a schematic block diagram of a Realizer computing system.
FIGS. 61a-c illustrate the general architecture of the preferred embodiment, including the hierarchical interconnection of logic boards, boxes and rack.
FIGS. 62a-b show the physical construction of a logic board box and a Z-level box.
DETAILED DESCRIPTION
TABLE OF CONTENTS
1. Realizer Hardware System
1.1 Logic and interconnect Chip Technology
1.2 Interconnect Architecture
1.2.1 Nearest-Neighbor Interconnects
1.2.2 Crossbar Interconnects
1.2.3 Interconnecting Tri-State Nets
1.2.4 System-Level Interconnect
1.3 Special-Purpose Elements
1.3.1 Design Memory
1.3.2 Stimulus/Response
1.3.3 User-Supplied Devices
1.4 Configuration
1.5 Host interface
2. Realizer Design Conversion System
2.1 Design Reader
2.2 Primitive Conversion
2.3 Partitioning
2.4 Netlisting & Interconnection
3. Realizer Applications
3.1 Realizer Logic Simulation System
3.1.1 Logic Sim. Stimulus & Response Translation System
3.1.2 Logic Simulation Operating Kernel
3.1.3 Using the Realizer Logic Simulation System
3.1.4 Realization of More Than Two States
3.1.5 Realizer Representation of Delay
3.1.6 Transferring State From a Realizer Sim. into Another Sim.
3.2 Realizer Fault Simulation System
3.3 Realizer Logic Simulator Evaluation System
3.4 Realizer Prototyping System
3.4.1 Realized Virtual Instruments
3.5 Realizer Execution System
3.6 Realizer Production System
3.7 Realizer Computing System
4. Preferred Embodiment
4.1 Hardware
4.2 Software
1 Realizer Hardware System
The Realizer hardware system (FIG. 1) consists of:
1) A set of Lchips, consisting of:
1) At least two logic chips (normally tens or hundreds).
2) Optionally, one or more special-purpose elements, such as memory modules and user-supplied device modules.
2) A configurable interconnect, connected to all LChip interconnetable I/O pins.
3) A host interface, connected to the host computer, the configuration system, and to all devices which can be used by the host for data input/output or control.
4) A configuration system, connected to the host interface, and to all configurable Lchip and interconnect devices.
This hardware is normally packaged in the form of logic boards, boxes and racks, and is connected to and is operated under the control of the host computer.
1.1 Logic & interconnect Chip Technology
1.1.1 Logic Chip Devices
For a device to be useful as a Realizer logic chip, it should be an electronically reconfigurable gate array (ERCGA):
1) It should have the ability to be configured according to any digital logic network consisting of combinational logic (and optionally storage), subject to capacity limitations.
2) It should be electronically reconfigurable, in that its function and internal interconnect may be configured electronically any number of times to suit many different logic networks.
3) It should have the ability to freely connect I/O pins with the digital network, regardless of the particular network or which I/O pins are specified, to allow the Realizer system partial crossbar or direct interconnect to successfully interconnect logic chips.
An example of a reconfigurable logic chip which is suitable for logic chips is the Logic Cell Array (LCA) ("The Programmable Gate Array Handbook", Xilinx, Inc., San Jose, Calif., 1989). It is manufactured by Xilinx, Inc., and others. This chip consists of a regular 2-dimensional array of Configurable Logic Blocks (CLBs), surrounded by reconfigurable I/O Blocks (IOBs), and interconnected by wiring segments arranged in rows and columns among the CLBs and IOBs. Each CLB has a small number of inputs, a multi-input combinational logic network, whose logic function can be reconfigured, one or more flip-flops, and one or more outputs, which can be linked together by reconfigurable interconnections inside the CLB. Each IOB can be reconfigured to be an input or output buffer for the chip, and is connected to an external I/O pin. The wiring segments can be connected to CLBs, IOBs, and each other, to form interconnections among them, through reconfigurable pass transistors and interconnect matrices. All reconfigurable features are controlled by bits in a serial shift register on the chip. Thus the LCA is entirely configured by shifting in the "configuration bit pattern", which takes between 10 and 100 milliseconds. Xilinx 2000 and 3000
series LCAs have between 64 and 320 CLBs, with between 56 and 144 IOBs available for use.
The LCA netlist conversion tool (described below) maps logic onto CLBs so as to optimize the interconnections among CLBs and IOBs. The configurability of interconnect between CLBs and the I/O pins gives the LCA the ability to freely connect I/O pins with the digital network, regardless of the particular network or which I/O pins are specified. The preferred implementation of the Realizer system uses LCA devices for its logic chips.
Another type of ERCGA which is suitable for logic chips is the ERA, or electrically reconfigurable array. A commercial example is the Plessey ERA60K-type device. It is configured by loading a configuration bit pattern into a RAM in the part. The ERA is organized as an array of two-input NAND gates, each of which can be independently interconnected with others according to values in the RAM which switch the gates' input connections to a series of interconnection paths. The ERA60100 has about
10,000 NAND gates. I/O cells on the periphery of the array are used to connect gate inputs and/or outputs to external I/O pins. The ERA netlist conversion tool maps logic onto the gates so as to optimize the interconnections among them, and generates a configuration bit pattern file, as described below. The configurability of interconnect between gates and the I/O cells gives the ERA the ability to freely connect I/O pins with the digital network, regardless of the particular network, or which I/O pins are specified.
Still another type of reconfigurable logic chip which could be used as a logic chip is the EEPLD, or electrically erasable programmable logic device ("GAL Handbook", Lattice Semiconductor Corp., Portland, Oreg., 1986). A commercial example is the Lattice Generic Array Logic (GAL). It is configured by loading a bit pattern into the part which configures the logic. The GAL is organized as a sum-of-products array with output flip-flops, so it is less generally configurable than the Xilinx LCA. It offers freedom of connection of I/O pins to logic only among all input pins and among all output pins, so it partially satisfies that requirement. It is also smaller, with 10 to 20 I/O pins. It can, however, be used as a Realizer logic chip.
Additional details on programmable logic chips can be found in U.S. Pat. Nos. 4,642,487, 4,700,187, 4,706,216, 4,722,084, 4,724,307, 4,758,985, 4,768,196 and 4,786,904 the disclosures of which are incorporated herein by reference.
1.1.2 Interconnect Chip Devices
Interconnect chips include crossbar chips, used in full and partial crossbar interconnects, and routing chips, used in direct and channel-routed interconnects. For a device to be useful as a Realizer interconnect chip:
1) It should have the ability to establish many logical interconnections between arbitrarily chosen groups of I/O pins at once, each interconnection receiving logic signals from its input I/O pin and driving those signals to its output I/O pin(s).
2) It should be electronically reconfigurable, in that its interconnect is defined electronically, and may be redefined to suit many different designs.
3) If a crossbar summing technique is used to interconnect tri-state nets in the partial crossbar interconnect, it should be able to implement summing gates. (If not, other tri-state techniques are used, as discussed in the tri-state section.)
The ERCGA devices discussed above, namely the LCA, the ERA and the EEPLD, satisfy these requirements, so they may be used as interconnect chips. Even though little or no logic is used in the interconnect chip, the ability to be configured into nearly any digital network includes the ability to pass data directly from input to output pins. The LCA is used for crossbar chips in the preferred implementation of the Realizer system.
Crossbar switch devices, such as the TI 74AS8840 digital crossbar switch (SN74AS8840 Data Sheet, Texas Instruments, Dallas Tex., 1987), or the crosspoint switch devices commonly used in telephone switches, may be used as interconnect chips. However, they offer a speed of reconfiguration comparable to the speed of data transfer, as they are intended for applications where the configuration is dynamically changing during operation. This is much faster than the configuration speed of the ERCGA devices. Consequently, such devices have higher prices and lower capacities than the ERCGAs, making them less desirable Realizer interconnection chips.
1.1.3 ERCGA Configuration Software
The configuration bit patterns, which are loaded into an ERCGA to configure its logic according to a user's specifications, are impractical for the user to generate on his own. Therefore, manufacturers of ERCGA devices commonly offer netlist conversion software tools, which convert logic specifications contained in a netlist file into a configuration bit pattern file.
The Realizer design conversion system uses the netlist conversion tools provided by the ERCGA vendor(s). Once it has read in the design, converted it, partitioned it into logic chips, and determined the interconnect, it generates netlists for each logic and interconnect chip in the Realizer hardware. The netlist file is a list of all primitives (gates, flip-flops, and I/O buffers) and their interconnections which are to be configured in a single logic or interconnect chip.
The Realizer design conversion system applies the ERCGA netlist conversion tool to each netlist file, to get a configuration file for each chip. When different devices are used for logic chips and interconnect chips, the appropriate tool is used in each case. The configuration file contains the binary bit patterns which, when loaded into the ERCGA device, will configure it according to the netlist file's specifications. It then collects these files into a single binary file which is permanently stored, and used to configure the Realizer system for the design before operation. The Realizer design conversion system conforms to the netlist and configuration file formats defined by the ERCGA vendor for its tool.
1.1.4 Netlist Conversion Tools
Since the preferred implementation of the Realizer system uses LCAs for logic and crossbar chips, the Xilinx LCA netlist conversion tool and its file formats are described here. Other ERCGA netlist conversion tools will have similar characteristics and formats.
Xilinx's LCA netlist conversion tool (XACT) takes the description of a logic network in netlist form and automatically maps the logic elements into CLBs. This mapping is made in an optimal way with respect to I/O pin locations, to facilitate internal interconnection. Then the tool works out how to configure the logic chip's internal interconnect, creating a configuration file as its output result. The LCA netlist conversion tool only converts individual LCAs, and fails if the logic network is too large to fit into a single LCA.
The Xilinx LCA netlist file is called an XNF file. It is an ASCII text file, containing a set of statements in the XNF file for each primitive, specifying the type of primitive, the pins, and the names of nets connected to those pins. Note that these nets are interconnections in the LCA netlist, connecting LCA primitives, not the nets of the input design. Some nets in the XNF file directly correspond to nets of the input design as a result of design conversion, others do not.
For example, these are the XNF file primitive statements which specify a 2-input XOR gate, named `I.sub.-- 1781`, whose input pins are connected to nets named `DATA0` and `INVERT`, and whose output pin is connected to a net named `RESULT`:
______________________________________ SYM,I.sub.-- 1781,XOR PIN,O,O,RESULT PIN,1,I,DATA0 PIN,0,I,INVERT END ______________________________________
Input and output I/O pin buffers IUF, for input, and OBUF, for output) are specified in a similar way, with the addition of a statement for specifying the I/O pin. These are the primitive statements for the OBUF which drives net `RESULT` onto I/O pin `P57`, via a net named `RESULT.sub.-- D`:
______________________________________ SYM,IA.sub.-- 1266,OBUF PIN,O,O,RESULT.sub.-- D PIN,I,I,RESULT END EXT,RESULT.sub.-- D,O,,LOC=P57 ______________________________________
The Xilinx LCA configuration file is called an RBT file. It is an ASCII text file, containing some header statements identifying the part to be configured, and a stream of `0`s and `1`s, specifying the binary bit pattern to be used to configure the part for operation.
1.2 Interconnect Architecture
Since in practice, many logic chips must be used to realize a large input design, the logic chips in a Realizer system are connected to a reconfigurable interconnect, which allows signals in the design to pass among the separate logic chips as needed. The interconnect consists of a combination of electrical interconnections and/or interconnecting chips. To realize a large design with the Realizer system, hundreds of logic chips, with a total of tens of thousands of I/O pins, must be served by the interconnect.
An interconnect should be economically extensible as system size grows, easy and reliable to configure for a wide variety of input designs, and fast, minimizing delay between the logic chips. Since the average number of pins per net in real designs is a small number, which is independent of design size, the size and cost of a good interconnect should increase directly as the total number of logic chip pins to be connected increases. Given a particular logic chip capacity, the number of logic chips, and thus the number of logic chip pins, will go up directly as design capacity goes up. Thus the size and cost of a good interconnect should also vary directly with the design capacity.
Two classes of interconnect architectures are described: Nearest-neighbor interconnects are described in the first section, and Crossbar interconnects are described in the following section. Nearest-neighbor interconnects are organized with logic chips and interconnect intermixed and arranged according to a surface of two, three or more dimensions. They extend the row-and-column organization of a gate array chip or printed circuit board into the organization of logic chips. Their configuration for a given input design is determined by a placement and routing process similar to that used when developing chips and boards. Crossbar interconnects are distinct from the logic chips being interconnected. They are based on the many-input-to-many-output organization of crossbars used in communications and computing, and their configuration is determined in a tabular fashion.
Nearest-neighbor interconnects grow in size directly as logic capacity grows, but as routing pathways become congested large interconnects become slow and determining the configuration becomes difficult and unreliable. Pure crossbars are very fast because of their directness and are very easy to configure because of their regularity, but they grow to impractical size very quickly. The partial crossbar interconnect preserves most of the directness and regularity of the pure crossbar, but it only grows directly with design capacity, making it an ideal Realizer interconnect. While practical Realizer systems are possible using the other interconnects shown, the partial crossbar is used in the preferred implementation, and its use is assumed through the rest of this disclosure.
1.2.1 Nearest-Neighbor Interconnects
1.2.1.1 Direct Interconnects
In the direct interconnect, all logic chips are directly connected to each other in a regular array, without the use of interconnect chips. The interconnect consists only of electrical connections among logic chips. Many different patterns of interconnecting logic chips are possible. In general, the pins of one logic chip are divided into groups. Each group of pins is then connected to another logic chip's like group of pins, and so forth, for all logic chips. Each logic chip only connects with a subset of all logic chips, those that are its nearest neighbors, in a physical sense, or at least in the sense of the topology of the array.
All input design nets that connect logic on more than one logic chip either connect directly, when all those logic chips are directly connected, or are routed through a series of other logic chips, with those other logic chips taking on the function of interconnect chips, passing logical signals from one I/O pin to another without connection to any of that chip's realized logic. Thus, any given logic chip will be configured for its share of the design's logic, plus some interconnection signals passing through from one chip to another. Non-logic chip resources which cannot fulfill interconnection functions, are connected to dedicated logic chip pins at the periphery of the array, or tangentially to pins which also interconnect logic chips.
A specific example, shown in FIG. 2, has logic chips laid out in a row-and-column 2-dimensional grid, each chip having four groups of pins connected to neighboring logic chips, north, south, east, and west, with memory, I/O and user-supplied devices connected at the periphery.
This interconnect can be extended to more dimensions, beyond this two-dimensional example. In general, if `n` is the number of dimensions, each logic chip's pins are divided into 2*n groups. Each logic chip connects to 2*n other logic chips in a regular fashion. A further variation is similar, but the sizes of the pin groups are not equal. Depending on the number of logic chips and the numbers of pins on each one, a dimension and set of pin group sizes is chosen that will minimize the number of logic chips intervening between any two logic chips while providing enough interconnections between each directly neighboring pair of chips to allow for nets which span only those two chips. Determining how to configure the logic chips for interconnect is done together with determining how to configure them for logic. To configure the logic chips:
1) Convert the design's logic into logic chip primitive form, as described in the primitive conversion section.
2) Partition and place the logic primitives in the logic chips. In addition to partitioning the design into sub-networks which each fit with in a logic chip's logic capacity, the sub-networks should be placed with respect to each other so as to minimize the amount of interconnect required. Use standard partitioning and placement tool methodology, such as that used in a gate-array or standard-cell chip automatic partitioning and placement tool ("Gate Station Reference Manual", Mentor Graphics Corp., 1987), to determine how to assign logic primitives to logic chips so as to accomplish the interconnect. Since that is a well-established methodology, it is not described further here.
3) Route the interconnections among logic chips, that is, assign them to specific logic chips and I/O pin interconnections, using standard routing tool methodology, such as that used in a gate-array or standard-cell chip automatic routing tool ("Gate Station Reference Manual", Mentor Graphics Corp., 1987), to determine how to configure the chips so as to accomplish the interconnect. Since that is a well-established methodology as well, it is not described further here, except in terms of how it is applied to the interconnection problem. The array of logic chips is treated with the same method as a single large gate array or standard-cell chip, with each partitioned logic sub-network corresponding to a large gate array logic macro, and the interconnected logic chip I/O pins defining wiring channels available for routing. Specifically, there are as many channels in each routing direction as there are pins in each group of interconnected logic chip I/O pins. Since there are many possibilities for interconnection through the logic chips, the routing is not constrained to use the same channel at each end, with the same method as when many routing layers remove channel constraints in a gate array.
4) If it is not possible to accomplish an interconnect, due to routing congestion (unavailability of routing channels at some point during the routing process), the design is re-partitioned and/or re-placed using adjusted criteria to relieve the congestion, and interconnect is attempted again.
5) Convert the specifications of which nets occupy which channels into netlist files for the individual logic chips and specific pin assignments for the logic chip signals, according to the correspondence between specific routing channels and I/O pins. Issue these specifications in the form of I/O pin specifications and logic chip internal interconnections, along with the specifications of logic primitives, to the netlist file for each logic chip.
6) Use the logic chip netlist conversion tool to generate configuration files for each logic chip, and combine them into the final Realizer configuration file for the input design.
1.2.1.2 Channel-Routing Interconnects
The channel-routing interconnect is a variation of the direct interconnect, where the chips are divided into some which are not used for logic, dedicated only to accomplishing interconnections, thus becoming interconnect chips, and the others are used exclusively for logic, remaining logic chips. In particular, logic chips are not directly interconnected to each other, but instead connect only to interconnect chips. In all other respects, the channel-routing interconnect is composed according to the direct interconnect method. Nets which span more than one logic chip are interconnected by configuring a series of interconnect chips, called routing chips, that connect to those logic chips and to each other, such that logical connections are established between the logic chip I/O pins. It is thus used as a configurable `circuit board`.
One example of a channel-routing interconnect is two-dimensional: logic chips are arranged in a row-and-column manner, completely surrounded by routing chips, as shown in FIG. 3. The array is made up of rows entirely composed of routing chips alternating with rows composed of alternating logic and routing chips. In this way, there are unbroken rows and columns of routing chips, surrounding the logic chips. The pins of each chip are broken into four groups, or edges, named "north, east, south and west." The pins of each chip are connected to its four nearest neighbors in a grid-wise fashion: north pins connected with the northern neighbor's south pins, east pins connected with the eastern neighbor's west pins, and so forth.
This model can be extended to more dimensions, beyond the two-dimensional example given above. In general, if `n` is the number of dimensions, each logic chip's pins are divided into 2*n groups. Each logic chip connects to 2*n neighbors. There are (2**n-1) routing chips for each logic chip at the center of the array.
Generalizations of this channel-routing model are used as well, based on the distinction between logic and routing chips. The pins of the logic chips can be broken into any number of groups. The pins of the routing chips can be broken into any number of groups, which need not be the same number as that of the logic chip groups. The logic chips and routing chips need not have the same number of pins. These variations are applied so long as they result in a regular array of logic and routing chips, and any given logic chip only connects with a limited set of its nearest neighbors.
Determining how to configure the interconnect chips is done together with determining how to configure the logic chips, with the same method used for the direct interconnect, with the exception that interconnections between logic chips are only routed through interconnect chips, not through logic chips.
A net's logical signal passes through as many routing chips as are needed to complete the interconnection. Since each routing chip delays the propagation of the signal, the more routing chips a signal must pass through, the slower the signal's propagation delay time through the interconnect. It is desirable in general to partition the logic design and place the partitions onto specific logic chips in such a way as to minimize the routing requirements. If it is not possible to accomplish an interconnect, due to routing congestion, the design is re-partitioned and/or re-placed using adjusted criteria to relieve the congestion, and interconnect is attempted again. This cycle is repeated as long as necessary to succeed.
1.2.2 Crossbar Interconnects
1.2.2.1 Full Crossbar Interconnect
The crossbar is an interconnection architecture which can connect any pin with any other pin or pins, without restriction. It is used widely for communicating messages in switching networks in computers and communication devices. An interconnect organized as a full crossbar, connected to all logic chip pins and able to be configured into any combination of pin interconnections, accomplishes the interconnect directly for any input design and logic chip partitioning, since it could directly connect any pin with any other. Unfortunately, there is no practical single device which can interconnect a number of logic chips. The logic board of the preferred embodiment, for example, has 14 logic chips with 128 pins each to be connected, for a total of 1792 pins, far beyond the capability of any practical single chip. It is possible to construct crossbars out of a collection of practical interconnect chips, devices which can be configured to implement arbitrary interconnections among their I/O pins. In the context of crossbar interconnects, they are also called crossbar chips.
A general method of constructing a crossbar interconnect out of practical crossbar chips is to use one crossbar chip to interconnect one logic chip pin with as many other logic chip pins as the crossbar chip has pins. FIG. 4 shows an example, extremely simplified for clarity. Four logic chips, with eight pins each, are to be interconnected. Crossbar chips with nine pins each are used. The left-most column of three crossbar chips connects logic chip 4's pin H with pins of logic chip 1, 2
and 3. The next column connects pin G, and so on to pin G of logic chip 4. There is no need to connect a logic chip pin with other pins on the same logic chip, as that would be connected internally. The next eight columns of crossbar chips interconnect logic chip 3 with logic chips 1 and 2. Logic chip 4 is not included because its pins are connected to logic chip 3's pins by the first eight columns of crossbar chips. The final eight columns interconnect logic chips 1 and 2. A total of
48 crossbar chips are used.
Two nets from an input design are shown interconnected. Net A is driven by logic chip 1, pin D, and received by logic chip 4, pin B. The crossbar chip marked I is the one which connects to both of those pins, so it is configured to receive from chip 1, pin D and drive what it receives to chip 4, pin B, thus establishing the logical connection. Net B is driven by chip 2, pin F and received by chip 3, pin G and chip 4, pin G. Crossbar chip 2 makes the first interconnection, and crossbar chip 3
makes the second.
In general, the number of crossbar chips required can be predicted. If there are L logic chips, each with Pl pins, and crossbar chips, which each interconnect one logic chip pin with as many other logic chip pins as possible, have Px pins:
1) One pin of logic chip 1 must be connected to (L-1)Pl pins on logic chips 2 through L. This will require (L-1)Pl/(Px-1) crossbar chips. Connecting all pins will require (L-1)Pl.sup.2 /(Px-1) crossbar chips.
2) Each pin of logic chip 2 must be connected to (L-2)Pl pins on logic chips 3 through L. This will require (L-2)Pl.sup.2 /(px-1) crossbar chips.
3) Each pin of logic chip L-1 must be connected to Pl pins on logic chip L. This will require Pl.sup.2 /(Px-1) crossbar chips.
4) X=(L-1)Pl.sup.2 /(Px-1)+(L-2)Pl.sup.2 /(Px-1)+ . . . +Pl.sup.2 /(Px1)=(L2-L)Pl.sup.2 /2(Px-1).
The number of crossbar chips, X, increases as the square of the number of logic chips times the square of the number of pins per logic chip. A crossbar interconnect for the preferred embodiment's logic board (14 logic chips with 128 pins each) would require 11648 crossbar chips with 129 pins each, or 23296 crossbar chips with 65 pins each. Crossbar interconnects are impractically large and expensive for any useful Realizer system.
1.2.2.2 Full Crossbar-Net Interconnect
The size of a crossbar interconnect can be reduced by recognizing that the number of design nets to be interconnected can never exceed one half of the total number of logic chip pins. A crossbar-net interconnect is logically composed of two crossbars, each of which connects all logic chip pins with a set of connections, called interconnect nets (ICNs), numbering one half the total number of logic chip pins. Since a crossbar chip which connects a set of logic chip pins to a set of ICNs can also connect from them back to those pins (recalling the generality of interconnect chips), this interconnect is built with crossbar chips each connecting a set of logic chip pins with a set of ICNs.
FIG. 5 shows an example, interconnecting the same four logic chips as in FIG. 4. Crossbar chips with eight pins each are used, and there are 16 ICNs. Each of the 32 crossbar chips connects four logic chip pins with four ICNs. Net A is interconnected by crossbar chip 1, configured to receive from chip 1, pin D and drive what it receives to an ICN, and by crossbar chip 2, which is configured to receive that ICN and drive chip 4, pin B, thus establishing the logical connection. Net B is driven by chip 2, pin F, connected to another ICN by crossbar chip 3, received by chip 3, pin G, via crossbar chip 4, and by chip 4, pin G, via crossbar chip 5.
A crossbar-net interconnect for the preferred embodiment's logic board (14 logic chips with 128 pins each) would require 392 crossbar chips with 128 pins each, or 1568 crossbar chips with 64 pins each. The crossbar-net interconnect uses fewer crossbar chips than the pure crossbar. Its size increases as the product of logic chips and total logic chip pins, which amounts to the square of the number of logic chips. This is better than the pure crossbar, but still not the direct scaling desired.
1.2.2.3 Partial Crossbar Interconnect
The logic chip itself can offer an additional degree of freedom which crossbars do not exploit, because it has the ability to be configured to use any of its I/O pins for a given input or output of the logic network it is being configured for, regardless of the particular network. That freedom allows the possibility of the partial crossbar interconnect, which is the reason it is specified in the definition of the logic chip.
In the partial crossbar interconnect, the I/O pins of each logic chip are divided into proper subsets, using the same division on each logic chip. The pins of each crossbar chip are connected to the same subset of pins from each of every logic chip. Thus, crossbar chip `n` is connected to subset `n` of each logic chip's pins. As many crossbar chips are used as there are subsets, and each crossbar chip has as many pins as the number of pins in the subset times the number of logic chips. Each logic chip/crossbar chip pair is interconnected by as many wires, called paths, as there are pins in each subset.
Since each crossbar chip is connected to the same subset of pins on each logic chip, an interconnection from an I/O pin in one subset of pins on one logic chip to an I/O pin in a different subset of pins on another logic chip cannot be configured. This is avoided by interconnecting each net using I/O pins from the same subset of pins on each of the logic chips to be interconnected, and configuring the logic chips accordingly. Since the logic chip can be configured to use any I/O pin may be assigned to the logic configured in a logic chip which is connected to a net, one I/O pin is as good as another.
The general pattern is shown in FIG. 6. Each line connecting a logic chip and a crossbar chip in this figure represents a subset of the logic chip pins. Each crossbar chip is connected to a subset of the pins of every logic chip. Conversely, this implies that each logic chip is connected to a subset of the pins of every crossbar chip. The number of crossbar chips need not equal the number of logic chips, as it happens to in these examples. It does not in the preferred implementation.
FIG. 7 shows an example, interconnecting the same four logic chips as in FIGS. 1 and 2. Four crossbar chips with eight pins each are used. Each crossbar chip connects to the same two pins of each logic chip. Crossbar chip 1 is connected to pins A and B of each of logic chips 1 through 4. Crossbar chip 2 is connected to all pins C and D, chip 3 to all pins E and F, and chip 4 to all pins G and H.
Design net A was received on pin B of logic chip 4 in the previous examples, but there is no crossbar chip or chips which can interconnect this with the driver on pin D of logic chip 1. Since any I/O pin may be assigned to the logic configured in logic chip 4 which receives net A, pin C is as good as pin B, which may then be used for some other net. Consequently, net A is received by pin C instead, and the interconnection is accomplished by configuring crossbar chip 2. Design net B is received by chip 3, pin G, and by chip 4, pin G, but there is no crossbar chip or chips which can interconnect this with the driver on pin F of logic chip 2. Net B is driven by pin H instead, and the interconnection is accomplished by configuring crossbar chip 4.
The partial crossbar interconnect is used in the preferred embodiment. Its logic board consists of 14 logic chips, each with 128 pins, interconnected by 32 crossbar chips with 56 pins each. Logic chip pins are divided into 32 proper subsets of four pins each, and the pins of each crossbar chip are divided into 14 subsets of four pins each. Each logic chip/crossbar chip pair is interconnected by four paths, as crossbar chip `n` is connected to subset `n` of each logic chip's pins.
The partial crossbar uses the fewest crossbar chips of all crossbar interconnects. Its size increases directly as total number of logic chip pins increases. This is directly related to the number of logic chips and thus logic capacity, which is the desired result. It is fast, in that all interconnections pass through only one interconnect chip. It is relatively easy to use, since it is regular, its paths can be represented in a table, and determining how to establish a particular interconnect is simply a matter of searching that table for the best available pair of paths.
1.2.2.4 Capability of the Partial Crossbar Interconnect
Partial crossbar interconnects cannot handle as many nets as full crossbars can. The partial crossbar interconnect will fail to interconnect a net when the only I/O pins not already used for other nets on the source logic chip go to crossbar chips whose paths to the destination logic chip are likewise full. The destination may have pins available, but in such a case they go to other crossbars with full source pins, and there is no way to get from any of those crossbars to the first.
The capacity of a partial crossbar interconnect depends on its architecture. At one logical extreme, there would be only one logic chip pin subset, and one crossbar would serve all pins. Such an arrangement has the greatest ability to interconnect, but is the impractical full crossbar. At the other logical extreme, the subset size is one, with as many crossbar chips as there are pins on a logic chip. This will have the least ability to interconnect of all partial crossbars, but that ability could still be enough. In between are architectures where each crossbar chip serves two, three, or more pins of each logic chip. More interconnect ability becomes available as the crossbar chip count drops and the in count per crossbar chip increases.
This variation derives from the fact, noted earlier, that there may be free logic chip pins which cannot be interconnected because they are served by different crossbar chips. The fewer and wider the crossbar chips, the less commonly this will crop up. The full crossbar can interconnect all pins in any pattern, by definition.
As a simple example of the difference, suppose there are three logic chips, numbered 1, 2 and 3, with three pins each, and there are four nets, A, B, C and D. Net A connects logic chips 1 and 2, B connects 1 and 3, C connects 2 and 3, and D connects logic chips 1 and 2. In FIGS. 8a and 8b, the pins of each logic chip are shown as a row of cells, and each crossbar chip covers as many columns as the number of pins it serves.
In the first case (FIG. 8a), we use three crossbar chips, numbered 1, 2 and 3, which are each one pin wide. Each crossbar chip can only accommodate one net: crossbar chip 1 is programmed to interconnect net A, crossbar 2 connects net B, and crossbar chip 3 connects net C. Net D is left unconnected, even though there are free logic chip pins available. In the second case (FIG. 8b), a full crossbar which is three pins wide is used instead of crossbar chips 1, 2 and 3, and net D may be connected.
Analysis and computer modeling has been conducted on the number of input design nets which can be interconnected by different partial crossbar interconnect architectures. Results indicate that a narrow partial crossbar is nearly as effective as a wide one or even a full crossbar. For example, the interconnect used on the logic board in the preferred implementation (14 128-pin logic chips, 32 56-pin crossbar chips) showed 98% of the interconnect capacity that a full crossbar would have.
It is extremely rare for real input designs to demand the maximum available number of multi-logic-chip nets and logic chip pins, as was assumed in the modeling. Real designs will nearly always have fewer nets than the maximum possible, and fewer than the average number of nets connected by the partial crossbar in the above model, usually substantially fewer. This is insured by using a small proportion more logic chip pins and crossbar chips than would be absolutely necessary to support the logic capacity, thus insuring that real designs are nearly always interconnectable by a narrow partial crossbar.
Narrow crossbar chips are much smaller, and therefore less expensive, pin-for-pin, than wide ones. Since they offer nearly as much interconnectability, they are preferred.
1.2.3 Interconnecting Tri-State Nets
An important difference between an active interconnect, such as the partial crossbar interconnect, and a passive one, such as actual wire, is that the active interconnect is unidirectional. Each interconnection actually consists of a series of drivers and receivers at the chip boundaries, joined by metal and traces. Normal nets have a single driver, and may be implemented with fixed drivers and receivers in the active interconnect. Some nets in actual designs are tri-state, with several tri-state drivers, as shown in FIG. 9.
At any given time, a maximum of one driver is active, and the others are presenting high impedance to the net. All receivers see the same logic level at all times (neglecting propagation delays).
1.2.3.1 Sum of Products Replaces Tri-State Net
If the entire net is partitioned into the same logic chip, the network may be replaced by a two-state sum of products, or multiplexer, equivalent, as shown in FIG. 10.
When there are no active enables, this network will output a logic low. Often tri-state nets are passively pulled high. When necessary, the sum of products is made to output a logic high when not enabled by inverting the data input to each AND, and inverting the final summing gate output. When more than one enable is active, the result is the sum (OR) of all inputs. This is acceptable, as the behavior of real tri-state drivers is undefined when more than one is enabled with different data.
FIGS. 11a and 11b show both types of networks: "floating low" and "floating high."
The primitive conversion part of the Realizer system's design conversion system makes the sum or products substitution, because the Xilinx LCA, used for the logic and crossbar chips in the preferred implementation, does not support tri-state drive uniformly on all nets. Tri-state drivers are available on all I/O pins at the boundary of the LCA. A limited number of tri-state drivers are available internally in the XC3000 series LCAs, only on a small number of internal interconnects spaced across the chip, each of which serves only a single row of CLBs. Mapping tri-state nets onto those interconnects would add another constraint to partitioning, and could constrain the freedom of CLB placement on the LCA. At the same time, tri-state connections with a small number of drivers per net are common in some gate array library cells. Consequently, the sum of products substitution is made when possible to avoid these complexities.
When a tri-state net has been split across more than one logic chip by the partitioning of the design into multiple logic chips, sums of products are used locally to reduce each logic chip's connection to the net to a single driver and/or receiver at the logic chip boundary. FIG. 12 shows two drivers and two receivers collected together. The two drivers are collected by a local sum of products, which then contributes to the overall sum of products, requiring only a single driver connection. Likewise, only a single receiver connection is distributed across two receivers.
Then the active interconnect comes into play. At any given point along a tri-state net, the "direction" of drive depends on which driver is active. While this makes no difference to a passive interconnect, an active interconnect must be organized to actively drive and receive in the correct directions. There are several configurations that accomplish this in the partial crossbar interconnect.
1.2.3.2 Logic Summing Configuration
Three configurations are based on reducing the net to a sum of products. The logic summing configuration places the summing OR gate in one of the logic chips involved, as shown in FIG. 13.
The AND gates which generate the products are distributed in the driving logic chips, each of which needs an output pin. Each receiving logic chip needs an input pin, and the summing logic chip, which is a special case, will need an input pin for each other driver and one output pin. These connections are all unidirectional, involving an OBUF/IBUF pair across each chip boundary. Since there is a higher pin cost for drivers, a driving logic chip should be chosen as the summing chip.
For the sake of clarity, not all LCA primitives involved are shown in these figures. The actual path from a driving input pin through to a receiving output pin includes a CLB and OBUF on the driver, an IBUF/OBUF on the crossbar, an IBUF, a CLB and an OBUF on the summing chip, another IBUF/OBUF on the crossbar, and an IBUF on the receiver. If we call the crossbar IBUF delay Ix, the logic CLB delay Cl, etc., the total datapath delay is Cl+Ol+Ix+Ox+Il+Cl+Ol+Ix+Ox+Il. In a specific case, if the logic chip is an XC3090-70, and the crossbar is an XC2018-70, the maximum total delay is 82 ns, plus internal LCA interconnect delay. The same delay applies to the enable.
If an n-bit bus is to be interconnected, all enables will be the same for each bit of the bus. In this particular configuration, the product gates are in the driving logic chips, the enables stay inside, and the pins required for the bus are just n times that for one bit.
1.2.3.3 Crossbar Summing Configuration
In the crossbar summing configuration, the summing OR gate is placed on the crossbar chip, making use of the fact that the crossbar chips in some embodiments are implemented with ERCGAs, such as LCAs, which have logic available, as shown in FIG.
14.
Each logic chip needs one pin if it is a driver, and/or one pin if it is a receiver. The crossbar chip must have one or more logic elements for the summing gate. Crossbar summing deviates from the practice of putting all logic in the logic chips and none in the crossbar chips, but an important distinction is that the logic placed in the crossbar chip is not part of the realized design's logic. It is only logic which serves to accomplish the interconnection functionality of a tri-state net.
This configuration uses fewer pins that the previous one when there are more than two driving logic chips. An n-bit bus takes n times as many pins. Total delay is reduced: Cl+Ol+Ix+Cx+Ox+Il, or 51 ns max. The enable has the same delay.
1.2.3.4 Bidirectional Crossbar Summing Configuration
The summing gate on the crossbar chip is reached via bidirectional connections in the bidirectional crossbar summing configuration, shown in FIG. 15.
AND gates which allow only the enabled path into the OR gate are provided in the crossbar chip to block feedback latchup paths. A logic chip needs one pin if it is only a receiver, and two pins if it is a driver or both, one for the signal itself and one for the enable output, which is used by the crossbar chip. Reduced interconnect is possible for multi-bit busses by using a single enable for more than one bit. If more than one bit of the bus is interconnected through the same crossbar chip, only one set of enable signals need be provided to that chip. The total datapath delay is Ol+Ix+Cx+Ox+Il, or 42 ns in the preferred LCA embodiment. An additional Cx (10 ns) may be added if the sum of products takes more than one CLB. The enable delay will depend on the enable delay for the OBUFZ, El, instead of the output delay Ol.
1.2.3.5 Bidirectional Crossbar Tri-State Configuration
Note that all the configurations specified so far may be used with identical hardware. Only the primitive placement and interconnect vary. Finally, if the crossbar chip supports internal tri-state, the bi-directional crossbar tri-state configuration duplicates the actual tri-state net inside the crossbar chip, shown in FIG. 16.
Each logic chip's actual tri-state driver is repeated onto the crossbar chip's bus, and should be accompanied by an interconnect for the enable signal. The crossbar chip's bus is driven back out when the driver is not enabled. If the LCA were used as a crossbar chip, its internal tri-state interconnects described above would be used. Specifically, there is an IBUF/OBUFZ pair at the logic chip boundary, another IBUF/OBUFZ pair for each logic chip on the crossbar chip boundary, and a TBUF for each logic chip driving the internal tri-state line. Each enable passes through an OBUF and an IBUF. The total enabled datapath delay is Ol+Ix+Tx+Ox+Il, or 39 ns (XC3030-70 LCA crossbar), and the total enable delay is Ol+Ix+TEx+Ox+Il, or 45 ns.
As before, if more than one bit of the bus is interconnected through the same crossbar chip, only one set of enable signals need be provided to that chip.
This configuration requires that the crossbar be an LCA or other such ERCGA which has internal tri-state capability, and is subject to the availability of those internal interconnects. Specifically, the XC2000-series LCAs do not have internal tri-state, but the XC3000 parts do. The XC3030 has 80 I/O pins, 100 CLBs, and 20 tri-state-drivable internal `long lines`. Thus a maximum of 20 such tri-state nets could be interconnected by one crossbar chip in this configuration. That could be the interconnect limitation, but only for a small fraction of cases, given the I/O pin limit. The XC3030 is twice as expensive as the XC2018 at this time.
If the hardware allows the tri-state configuration to be used, the other configurations are not precluded, and may be used as well.
1.2.3.6 Summary of All Configurations
This chart summarizes the configurations:
______________________________________ Bi-dir Bi-dir Logic Crossbar Crossbar Crossbar Summing Summing Summing Tri-state ______________________________________ Pins/logic chip: bi-directional =driving+ 2 1 datapath 1 datapath receiving 1
sharable 1 sharable enb. enb. driving-only 1st chip: 0 1 1 datapath 1 datapath others: 2 1 sharable 1 sharable enb. enb. receiving-only 1st non-sum: 2 1 1 others: 1 Delay: (assuming LCA crossbar chips: + LCA interconnect, 70 MHz LCA chip speed) datapath 82 ns 51 42 39 enable 82 51 46 45 Resources per chip: (d = number of drivers) driving-only 2-in AND 2-in AND 0 0 Sum: d-in OR receiving-only 0 0 0 0 bi-directional 2-in AND 2-in AND 0 0 crossbar 0 d-in OR d-in OR d TBUFs d
2-in 3-s bus ANDs ______________________________________
The logic summing configuration is clearly less effective. Crossbar summing is much faster and uses fewer pins, and is almost as simple. Bi-directional crossbar summing is slightly faster still, and offers the possibility of reduced pin count for bidirectional busses, but is more complex and places more demands on the limited logic resources in the crossbar chips. The tri-state configuration offers similar pin count and delay, but requires more expensive crossbar chips.
1.2.3.7 Comparing Plain and Bi-directional Crossbar Summing Configurations
It is useful to test the characteristics of the most efficient configurations. The following chart shows the number of crossbar CLBs and crossbar CLB delays incurred when the plain and bi-directional crossbar summing configurations are used to interconnect a large number of bi-directional nets, and when LCAs are used for crossbar chips. It assumes XC2018-70 crossbar chips are used, which have 72 I/O pins and 100 CLBs available. Each CLB supports up to 4 inputs and up to 2 outputs. Each logic chip is assumed to have a bi-directional connection to the net, with no enable sharing, so each test case uses all 72 I/O pins in the crossbar chip.
______________________________________ Crossbar Bi-dir Crossbar Summing Summing ______________________________________ 18 bi-dir nets serving 9 CLBs 18 CLBs 2 logic chips each 1 Cx 1 Cx 12 bi-dir nets serving 12 CLBs 24 CLBs 3 logic chips each 1 Cx 2 Cx 9 bi-dir nets serving 9 CLBs 27 CLBs 4 logic chips each 1 Cx 2 Cx 6 bi-dir nets serving 12 CLBs 24 CLBs 6 logic chips each 2 Cx 2 Cx 3 bi-dir nets serving 12 CLBs 30 CLBs 12 logic chips each 2 Cx 3 Cx ______________________________________
The bi-directional crossbar summing configuration uses up to 2.5 times as many CLBs, which increases the possibility that the crossbar chip won't route, or that the internal interconnect delays will be higher, although it stays well short of the
100 CLBs available. In exchange, the unidirectional configuration puts more gates on the logic chips, although the logic chips are in a better position to handle extra gates. The bi-directional configuration incurs extra Cx delays more often, which can offset its speed advantage. The preferred embodiment of the Realizer system uses the crossbar summing configuration for all tri-state nets.
1.2.4 System-Level Interconnect
The natural way to package a set of logic chips interconnected by crossbar chips is on a single circuit board. When a system is too large to fit on a single board, then the boards must be interconnected in some way, with a system-level interconnect. It is impractical to spread a single partial crossbar interconnect and its logic chips across more than one circuit board because of the very broad distribution of paths. For example, suppose a complex of 32 128-pin logic chips and 64-pin crossbar chips was to be split across two boards, 16 logic chips and 32 crossbars on each. If it was cut between the logic chips and the crossbar chips, then all 4096 interconnect paths between logic chips and crossbar chips would have to pass through a pair of backplane connectors. If it is cut the other way, `down the middle` with 16 logic chips and 32 crossbar chips on each board, then all the paths which connect logic chips on board 1 to crossbars on board 2 (16 logic * 64 pins=1024), and vice versa (another 1024, totalling 2048), would have to cross.
A further constraint is that a single such interconnect is not expandable. By definition, each crossbar chip has connections to all logic chips. Once configured for a particular number of logic chips, more may not be added.
Instead, the largest complex of logic and crossbar chips which can be packaged together on a circuit board is used treated as a module, called a logic board, and multiples of these are connected by a system-level interconnect. To provide paths for interconnecting nets which span more than one board, additional off-board connections are made to additional I/O pins of each of the crossbar chips of each logic board, establishing logic board I/O pins (FIG. 17). The crossbar chip I/O pins used to connect to logic board I/O pins are different from the ones which connect to the board's logic chip I/O pins.
1.2.4.1 Partial Crossbar System-Level Interconnects
One means of interconnecting logic boards is to reapply the partial crossbar interconnect hierarchically, treating each board as if it were a logic chip, and interconnecting board I/O pins using an additional set of crossbar chips. This partial crossbar interconnects all the boards in a box. A third interconnect is applied again to interconnect all the boxes in a rack, etc. Applying same interconnect method throughout has the advantage of conceptual simplicity and uniformity with the board-level interconnect.
To distinguish among crossbar chips in a Realizer system, the partial crossbar interconnect which interconnects logic chips is called the X-level interconnect, and its crossbar chips are called Xchips. The interconnect which interconnects logic boards is called the Y-level interconnect, and its crossbar chips are called Ychips. In the X-level interconnect, the I/O pins of each logic board are divided into proper subsets, using the same division on each logic board. The pins of each Ychip are connected to the same subset of pins from each of every logic board. As many Ychips are used as there are subsets, and each Ychip has as many pins as the number of pins in the subset times the number of logic boards.
Likewise, additional off-box connections are made to additional I/O pins of each of the Ychips, establishing box I/O pins, each of which are divided into proper subsets, using the same division on each box (FIG. 18). The pins of each Zchip are connected to the same subset of pins from each of every box. As many Zchips are used as there are subsets, and each Zchip has as many pins as the number of pins in the subset times the number of boxes. This method of establishing additional levels of partial crossbar interconnects can be continued as far as needed.
When the input design is partitioned, the limited number of board I/O pins through which nets which may pass on and off a board is a constraint which is observed, just as a logic chip has a limited number of I/O pins. In a multiple box Realizer system the limited number of box I/O pins is observed, and so on. The interconnect's symmetry means optimizing placement across chips, boards, or cardcages is not necessary, except so far as special facilities, such as design memories, are involved.
Bidirectional nets and busses are implemented using one of the methods discussed in the tri-state section, such as the crossbar summing method, applied across each level of the interconnect hierarchy spanned by the net.
A specific example is the preferred embodiment:
The partial crossbar interconnect is used hierarchically at three levels across the entire hardware system.
A logic board consists of up to 14 logic chips, with 128 interconnected I/O pins each, and an X-level partial crossbar composed of 32 Xchips. Each Xchip has four paths to each of the 14 Lchips (56 total), and eight paths to each of two Ychips, totalling 512 logic board I/O pins per board.
A box contains one to eight boards, with 512 interconnected I/O pins each, and a Y-level partial crossbar composed of 64 Ychips. Each Ychip has eight paths to an Xchip on each board via logic board I/O pins, and eight paths to one Zchip, totalling 512 box I/O pins per box.
A rack contains one to eight boxes, with 512 interconnected I/O pins each, and a Z-level partial crossbar composed of 64 Zchips. Each Zchip has eight paths to a Ychip in each box via box I/O pins.
1.2.4.2 Bidirectional Bus System-Level Interconnects
Computer hardware practice inspires another method of system-level interconnection of logic boards, using a backplane of bi-directional busses. Each logic board is provided with I/O pins, as before, and each board's I/O pin is connected to the like I/O pins of all the other boards in the box by a bus wire (FIG. 19).
Some logic board I/O pins are wasted, i.e. unable to interconnect design nets, since the use of a bus wire for interconnecting one design net blocks off the use of pins connected to that wire on all the other boards sharing the bus. The maximum number of design nets which can be interconnected is equal to the bus wires, which equals the number of I/O pins per board. For a specific example, suppose eight boards share a common interconnect bus, with 512 bus wires connecting the 512 I/O pins of each board (FIG. 20).
Assuming different distributions of 2, 3, 4, 5, 6, 7 and 8-board nets, analysis shows that while the average number of nets connecting to each board is 512 in each case, the boards and bus should be up to 1166 pins wide to allow for all the nets. This can be partially mitigated by keeping the number of boards on a single backplane small. But the maximum number of boards interconnected with one set of bidirectional busses is limited. To accommodate larger systems more efficiently, groups of busses are interconnected hierarchically.
The first example shown in FIG. 21 has two sets of busses, X0 and X1, connecting four boards each. The X-level busses are interconnected by another bus, Y. Each wire in an X bus can be connected to its counterpart in Y by a reconfigurable bidirectional transceiver, whose configuration determines whether the X and Y wires are isolated, driven X to Y, or Y to X. When a net connects only the left set of boards or the right set of boards, then only one or the other of the X-level busses is used. When boards on both sides are involved, then a wire in each of X0 and X1 is used, and these wires are interconnected by a wire in Y, via the transceivers. Each board should have as many I/O pins as the width of one of the X-level busses.
If the interconnection through Y is to be bi-directional, that is, driven from either X0 or X1, then an additional signal should be passed from X0 and X1 to dynamically control the transceiver directions.
This interconnect has been analyzed to show its capability for interconnecting nets among the boards, making the same net pin count and I/O pin count assumptions as above. While the single-level method requires the same width as the total number of all nets, breaking it into two decreases the maximum width required by 10 to 15%.
The maximum amount of hierarchy has only two boards or groups of boards per bus (FIG. 22).
Bidirectional bus interconnects are simple and easy to build, but they are expensive, because a large number of logic board I/O pins are wasted by connecting to other boards' nets. Introducing hierarchy and short backplanes to avoid this proves to have very little effect. In addition, the introduction of bidirectional transceivers removes a speed and cost advantage that the single-level backplane bus interconnect had over a partial crossbar. Consequently, partial crossbars are used in the system-level interconnect of the preferred embodiment.
1.3 Special-Purpose Elements
Special-purpose elements are hardware elements which contribute to the realization of the input design, and which are installed in Lchip locations on the logic board of the preferred embodiment, but which are not combinational logic gates or flip-flops, which are configured into logic chips.
1.3.1 Design Memory
Most input designs include memory. It would be ideal if logic chips included memory. Current logic chip devices don't, and even if they did, there would still be a need for megabyte-scale main memories which one would never expect in a logic chip. Therefore, design memory devices are included in the Realizer system.
1.3.1.1 Design Memory Architecture
The architecture of a design memory module is derived from requirements:
a) Since it is part of the design, it should be freely interconnectable with other components.
b) It should allow freedom in assigning data, address and control inputs and outputs to interconnect paths, as the logic chip does, to allow successful interconnection.
c) A variety of configurations allowing one or more design memories, with different capacities and bit widths, and either common or separate I/O, should be available.
d) It should be accessible by the host interface to allow debugger-type interaction with the design.
e) It should be static, not dynamic, so the design may be stopped, started or run at any clock speed, at will.
The general architecture of a memory module that satisfies these requirements is shown in FIG. 23.
To support interconnectability with the design, and flexibility of physical composition of the Realizer system, the memory module is designed to plug into an Lchip socket, connected to the same interconnect and other pins as the logic chip it replaces. As many modules as needed are installed.
RAM chips are not directly connected to the interconnect, mainly because their data, address and control functions are fixed to specific pins. Since the success of the partial crossbar interconnect depends on the logic chip's ability to freely assign internal interconnects to I/O pins, non-logic chip devices installed in a logic chip's place should have a similar capability. To accomplish this, and to provide for other logic functions in the memory module, logic chips are installed in the memory module, interconnecting the RAM chips with the crossbar's Xchips.
They are configured to interconnect specific RAM pins with arbitrarily chosen Xchip pins, using the same L-X paths used by the logic chip whose place the memory module has taken. More than one logic chip is used per module because of the large numbers of RAM pins and L-X paths to be connected.
An additional function of the memory module's logic chips is to provide it with configurability and host accessibility. Address, data and control paths are configured through the logic chips to connect the RAM chips in a variety of capacities, bit widths and input/output structures. The memory module may be configured as one large memory or several smaller ones. By connecting each of these logic chips to the host interface bus, and by configuring bus interface logic in them, functionality is realized which allows the host processor to randomly access the RAMs, so a user's host computer program, such as a debugger, can inspect and modify the memory contents. Examples of these logic structures are shown below.
The densest and cheapest available static memory which fulfills the timing requirements of realized designs is chosen for design memory. In the preferred embodiment, that device is the 32K by 8 bit CMOS SRAM, such as the Fujitsu MB84256. It is available at speeds down to 50 ns. Much faster devices offer diminishing returns, as the Realizer system's crossbar chip interconnect delays start to predominate.
Dynamic memory devices are not used because they must be refreshed regularly, which would present problems in the Realizer system. If the input design calls for a dynamic memory, presumably it includes refresh logic. However, since the realized design may not be operating at 100% of design speed, letting the design do the refresh may not be successful. In fact it is desirable to stop the design's operation altogether when debugging. Or, the design may be part of a system which depends for refresh on some other element, not included in the input design. Finally, if the design calls for static memory, refresh of a dynamic design memory would be impractical. A static memory can realize a dynamic memory in the design, as refresh cycles may just be ignored. Thus the design memory is implemented with static devices.
1.3.1.2 Using Logic Chips to Interconnect RAMs with the Crossbar
Ideally, a single logic chip would be used to interconnect RAMs with the X-level crossbar, with enough pins to connect to all RAM signal pins as well as all L-X interconnect paths. Practical Realizer system memory modules require far too many pins for a single logic chip to fulfill. For example, suppose 2 banks of eight 32K by 8 bit RAMs were used in a module with 128 L-X paths. Each RAM bank would have 15 address pins, 8 write enable pins, and 64 data pins. Two banks and the L-X paths would require 302 pins, plus pins for the host interface bus. This outstrips the pin count of available logic chips by a factor of two. More than one logic chip must be used. The architecture described here uses a number of small logic chips, which are given specialized functions, some for address and control, and others for the data paths.
1.3.1.2.1 Memory Address Logic Chips
Address and control logic chips are marked "MA0" and "MA1" in FIG. 23. The RAMs are split into banks, one controlled by each MA chip. There are as many MA chips as the maximum number of separate design memories to be realizable by the module. Each is given its own set of L-X paths to the crossbar, as many paths as needed for one bank's address and control lines. MA0 and MA1 use a different set of paths. For example, two MA chips, each connected to half the RAMs, allows two independent memories to be realized. If one larger memory is to be realized, the address and control nets are interconnected to both MA chips, using both sets of L-X paths. Each MA chip controls the address inputs of all RAMs in its bank, which are tied together in a single bus. Each MA chip individually controls the control inputs to the RAMs, to allow for data to be written into only the addressed RAM(s). Finally, each MA chip is connected to the host interface bus for accessibility, and to a control bus common to all logic chips on this memory module.
FIG. 24 shows in greater detail how an MA chip is connected to the X-level crossbar and to the RAM chips. The MA chip is configured according to the logic and data paths as shown. The full address enters the MA chip from the crossbar. Normally (when the bus interface is inactive), a fraction of address bits corresponding to the number of RAM address bits is passed on to address the RAMs in the bank controlled by this MA chip. The other address bits and the design's write enable drive decoder logic which controls the write enable signals for each RAM. This logic is configured according to the configuration needed for this design memory. For example, if the design memory has the same bit width as one of the RAMs, when the design asserts its write enable only a single RAM write enable will be asserted, according to the address bits. If the design memory is twice as wide as one chip, then a pair of RAM write enables will be asserted, and so on.
If a design memory with more than one write enable, each controlling a subset of the memory's data path width, is desired, several design write enable nets may be used, each operating along the lines described above, with suitable configuration of the decode logic in the MA and MD chips. This is subject to the availability of L-X paths into the MA chip and control bus paths into the MD chips.
The bus interface logic allows the host to access this RAM via the host interface bus. When this set of RAMs is addressed by the bus, the bus interface switches the address multiplexer (`mux`) to address the RAMs with its address. When the host is writing one of the RAMs, the bus interface logic sends a signal to the decoder logic, which uses the address bits not driving the RAMs to assert the appropriate RAM write enable.
Finally, some signals are needed to control the data paths in the MD chips. Since the MD chips are not all connected to the same L-X paths as the MA chip(s), they may not have access to the address and control signals from the design. A control bus is connected to all MA and MD chips to allow these signals, and bus interface control signals, to be sent to the MD chips.
1.3.1.2.2 Memory Data Path Logic Chips
MD chips handle the data paths according to a bit-slice organization. Multi-bit bus data paths are interconnected in the Realizer system by being bit-sliced across the crossbar. Busses are spread out across the Xchips, with one or two bits per chip. MD chips are bit-sliced t