United States Patent5809263
Farmwald , ; et al.September 15, 1998

Title

Integrated circuit I/O using a high performance bus interface

Abstract

A memory subsystem for storing and retrieving data. At least one memory device Includes a bus Interface. The memory device has at least one memory section comprised of a plurality of memory cells. The bus interface of the at least one memory device couples the memory device to a bus. The bus comprises a group of controlled impedance transmission lines for carrying substantially all information necessary for a single memory device to receive a transaction request, including a memory transaction request, and for carrying substantially all information necessary for a single memory device to respond to the transaction request. The number of signaling lines is substantially less than the number of bits in the information necessary to request a memory transaction to store or retrieve data from the memory cells. Memory device selection information is time-multiplexed on the bus with other memory transaction request information.


Inventors:Farmwald; Michael (Berkeley, CA), Horowitz; Mark  (Palo Alto, CA)
Assignee:Rambus Inc. (Mountain View, CA)
Appl. No.:762139
Filed:December 9, 1996

Current U.S. Class:710/305 711/211 
Field of Search:395/309,280,306,307,497.01,497.03,308 711/170,171,211,212

U.S. Patent Documents
3633166January 1972Picard
3691534September 1972Varadi et al.
3740723June 1973Beausoleil et al.
3758761September 1973Henrion
3771145November 1973Wiener
3821715June 1974Hoff, Jr. et al.
3882470May 1975Hunter
3924241December 1975Kronies
3969706July 1976Proebsting et al.
3972028July 1976Weber et al.
3975714August 1976Weber et al.
3983537September 1976Parsons et al.
4007452February 1977Hoff, Jr.
4038648July 1977Chesley
4099231July 1978Kotok et al.
4191996March 1980Chesley
4205373May 1980Shah et al.
4234934November 1980Thorsud
4247817January 1981Heller
4249247February 1981Patel
4263650April 1981Bennett et al.
4286321August 1981Baker et al.
4306298December 1981McElroy
4315308February 1982Jackson
4333142June 1982Chesley
4354258October 1982Sato
4355376October 1982Gould
4373183February 1983Means et al.
4375665March 1983Schmidt
4385350May 1983Hansen et al.
4443864April 1984McElroy
4449207May 1984Kung et al.
4468738August 1984Hansen et al.
4470114September 1984Gerhold
4480307October 1984Budde et al.
4481625November 1984Roberts et al.
4481647November 1984Gombert et al.
4488218December 1984Grimes
4493021January 1985Agrawal et al.
4494185January 1985Gross et al.
4494186January 1985Goss et al.
4500905February 1985Shibata
4513370April 1985Ziv et al.
4513374April 1985Hooks, Jr.
4519034May 1985Smith et al.
4566098January 1986Gammage et al.
4570220February 1986Tetrick et al.
4571672February 1986Hatada et al.
4595923June 1986McFarland
4608700August 1986Kirtley, Jr. et al.
4630193December 1986Kris
4635192January 1987Ceccon et al.
4646270February 1987Voss
4649511March 1987Gdula
4649516March 1987Chung et al.
4654655March 1987Kowalski
4656605April 1987Clayton
4660141April 1987Ceccon et al.
4675813June 1987Locke
4706166November 1987Go
4719627January 1988Peterson et al.
4745548May 1988Blahut
4757473July 1988Kurihara et al.
4761799August 1988Arragon
4764846August 1988Go
4766536August 1988Wilson, Jr. et al.
4770640September 1988Walter
4775931October 1988Dickie et al.
4779089October 1988Theus
4785394November 1988Fischer
4785396November 1988Murphy et al.
4803621February 1989Kelly
4811202March 1989Schabowski
4818985April 1989Ikeda
4831338May 1989Yamaguchi
4837682June 1989Culler
4858112August 1989Puerzer et al.
4860198August 1989Takenaka
4862158August 1989Keller et al.
4882669November 1989Miura et al.
4920486April 1990Nielson
4933835June 1990Sachs et al.
4937733June 1990Gillett, Jr. et al.
4939510July 1990Masheff et al.
4940909July 1990Mulder et al.
4945471July 1990Neches
4947484August 1990Twitty et al.
4954992September 1990Kumanoya et al.
4965792October 1990Yano
4975763December 1990Baudouin et al.
4982400January 1991Ebersole
4998069March 1991Nguyen et al.
4998262March 1991Wiggers
5012408April 1991Conroy
5021772June 1991King et al.
5023488June 1991Gunning
5038317August 1991Callan et al.
5038320August 1991Heath et al.
5051889September 1991Fung et al.
5056060October 1991Fitch et al.
5063561November 1991Kimmo
5077693December 1991Hardee et al.
5083260January 1992Tsuchiya
5083296January 1992Hara et al.
5093807March 1992Hashimoto et al.
5107491April 1992Chew
5111423May 1992Kopec, Jr. et al.
5111464May 1992Farmwald et al.
5117494May 1992Costes et al.
5121382June 1992Yang et al.
5129069July 1992Helm et al.
5175831December 1992Kumar
5179670January 1993Farmwald et al.
5193149March 1993Awiszio et al.
5193199March 1993Dalrymple et al.
5220673June 1993Dalrymple et al.
5226009July 1993Arimoto
5247518September 1993Takiyasu et al.
5317723May 1994Heap et al.
5361277November 1994Grover
5371892December 1994Peterson et al.
5390149February 1995Vogley et al.
5452420September 1995Engdahl et al.
Other References
H Schumacher, "CMOS Subnanosecond True-ECL Output Buffer", IEEE Journal of Solid-State Circuits, vol. 25, No. 1, pp. 150-154 (Feb. 1990). .
International Search Report Dated Jul. 8, 1991 for PCT Patent Application No. PCT/US91/02590 filed Apr. 18, 1991. .
T. Yang, M. Horowitz, B. Wooley, "A 4-ns 4K .times. 1 bit Two-Port BiCMOS SRAM", IEEE Journal of Solid-State Circuits, vol. 23, No. 5, pp. 1030-1040 (Oct. 1988). .
J. Frisone, "A Classification for Serial Loop Data Communications Systems", Raleigh Patent Operations (Nov. 2, 1972). .
A. Khan, "What's the Best Way to Minimize Memory Traffic", High Performance Systems, pp. 59-67 (Sep. 1989). .
N. Margulis, "Single Chip RISC CPU Eases System Design", High Performance Systems, pp. 34-36, 40-41, 44 (Sep. 1989). .
R. Matick, "Comparison of Memory Chip Organizations vs. Reliability in Virtual Memories", FTCS 12th Annual International Symposium Fault-Tolerant Computing, IEEE Computer Society Fault-Tolerant Technical Committee, pp. 223-227 (Jun. 22, 1984). .
A. Agarwal et al., "An Evaluation of Directory Schemes for Cache Coherence," 15th Intern Symp. Comp. Architecture, pp. 280-289 (Jun. 1988). .
A. Agarwal et al., "An Analytical Cache Model," ACM Trans. on Comp Sys., vol. 7, No. (2), pp. 184-215 (May 1989). .
Beresford, "How to Tame High Speed Design," High-Performance Systems, pp. 78-83 (Sep. 1989). .
J. Carson, "Advanced On-Focal Plane Signal Processing for Non-Planar Infrared Mosaics", SPIE, vol. 311, pp. 53-58 (1981). .
G. Chesley, "Virtual Memory Integration," Submitted to IEETC (Sep. 1983). .
E. Davidson, "Electrical Design of a High Speed Computer Package," IBM J. Res. Develop., vol. 26, No. 3, pp. 349-361 (May 1982). .
F. Hart, "Multiple Chips Speed CPU Subsystems," High-Performance Systems, pp. 46-55 (Sep. 1989). .
M. Horowitz et al., "MIPS-X: A 20-MIPS Peak 32-bit Microprocessor with On-Chip Cache," IEEE J. Solid State Circuits, vol. SC-22, No. 5, pp. 790-798 (Oct. 1987). .
S. Kwon et al., "Memory Chip Organizations for Improved Reliability in Virtual Memories," IBM Technical Disclosure Bulletin, vol. 25, No. 6, pp. 2952-57 (Nov. 1982). .
R. Pease et al., "Physical Limits to the Useful Packaging Density of Electronic Systems," Stantdard Center for Integrated Systems, Stanford University (Sep. 1988). .
J. Peterson, "System-Level Concerns Set Performance Gains," High-Performance Systems, pp. 71-77 (Sep. 1989). .
B. Wooley et al., "Active Substrate System Integration," Proceedings 1987 IEEE International Conference on Computer Design: VLSI in Computers & Processors (Rye Brook, New York Oct. 5, 1987). .
D. Hawley, "Superfast Bus Supports Sophisticated Transactions," High Performance Systems, pp. 90-94 (Sep. 1989). .
M. Johnson et al., "A Variable Delay Line PLL for CPU-Coprocessor Synchronization," IEEE Journal of Solid-State Circuits, vol. 23, No. 5 (Oct. 1988)..~
Primary Examiner: Auve; Glenn A.
Attorney, Agent or Firm:Blakely, Sokoloff, Taylor & Zafman LLP

Parent Case Text



This is a continuation of application Ser. No. 08/607,780, filed Feb. 27, 1996 now abandoned, which is a continuation of application Ser. No. 08/222,646, filed Mar. 31, 1994, which has issued as U.S. Pat. No. 5,513,327, which is a continuation of application Ser. No. 07/954,945, filed Sep. 30, 1992, which has issued as U.S. Pat. No. 5,319,755, which is a continuation of application Ser. No. 07/510,898, filed Apr. 18, 1990, now abandoned.

Claims


What is claimed is:
1. A memory subsystem for storing and retrieving data, comprising:
(1) at least one memory device that includes a bus interface, the memory device having at least one memory section comprised of a plurality of memory cells; and
(2) a bus, wherein the bus interface of the at least one memory device couples the memory device to the bus, wherein the bus comprises a group of controlled impedance transmission lines for carrying substantially all information necessary for a single memory device to receive a transaction request, including a memory transaction request, and for carrying substantially all information necessary for a single memory device to respond to the transaction request;
wherein the number of signaling lines is substantially less than the number of bits in the information necessary to request a memory transaction to store or retrieve data from the memory cells; and
wherein memory device selection information is time-multiplexed on the bus with other memory transaction request information.

2. The memory subsystem of claim 1, wherein the transmission lines have substantially the same electrical characteristics.

3. The memory subsystem of claim 1, wherein the at least one memory device further comprises a device identification register, the contents of which are used to uniquely identify the at least one memory device among a plurality of memory devices connected to the bus;
wherein the transaction request includes a field specifying the memory device identification number; and
wherein the memory device responds to the transaction request when its identification number matches the identification number in the transaction request.

4. The memory subsystem of claim 3, further comprising configuration means for loading an identification value into the identification register.

5. The memory subsystem of claim 1, wherein the at least one memory device further comprises at least one address range register coupled to the at least one memory section, the contents of the address range register determining a range of addresses that corresponds to the memory section;
wherein the memory transaction request includes address information; and
wherein the at least one memory device responds to the memory transaction request when the address information falls within the range of addresses specified by the contents of the address range register.

6. The memory subsystem of claim 1, wherein the at least one memory device further comprises at least one access time register whose contents indicates a delay time after which the memory device must respond to a transaction request.

7. The memory subsystem of claim 6, further comprising a clock signaling line, wherein the delay time is determined by a number of clock cycles of the clock signaling line.

8. The memory subsystem of claim 1, wherein the at least one memory device further comprises at least one control register containing information that affects the operation of the memory device;
wherein the transaction request includes information designating a register transaction to be performed by the memory device and address information for selecting one of the control registers; and
wherein the memory device responds to the transaction request by transferring data information between the transmission lines and the selected one of the control registers via the bus interface.

9. The memory subsystem of claim 1, wherein the bus interface has an input coupled to receive an external clock signal having at least two phases; and wherein the bus interface further comprises:
(i) a plurality of I/O pads;
(ii) internal clock generation circuitry for receiving the external clock signal and generating at least two internal clocks, each internal clock having a phase associated with one of the phases of the external clock signal; and
(iii) a plurality of input receiver groups, each group comprising at least two receivers, coupled to each I/O pad and the internal clocks for receiving information present during each phase of the external clock from the bus and transferring the bus information from the plurality of I/O pads to a plurality of internal input lines coupled to the receiver groups.

10. The memory subsystem of claim 1, wherein the bus interface includes a bus driver for each of the transmission lines, the bus driver being coupled between a transmission line and ground and operating as a current source for current having a predetermined value;
wherein a first voltage is present on the transmission line when the current source is not conducting current and a second voltage is present on the transmission line when the current source is conducting current of the predetermined value; and
wherein the difference between the first voltage and second voltage is proportional to the product of the predetermined value of the current source and the impedance of the transmission line.

11. The memory subsystem of claim 10, wherein the first voltage is determined by a termination resistor coupled between a termination voltage and one end of the transmission line.

12. The memory subsystem of claim 1, wherein the bus interface includes internal clock generation circuitry receiving an external clock signal and generating a delay-locked internal clock having a desired phase difference between the external clock signal and the internal clock signal.

13. The memory subsystem of claim 12, wherein the internal clock generation circuitry comprises:
(1) a receiver for receiving the external clock signal and generating the internal clock;
(2) a delay line for adjustably delaying the internal clock to produce a delayed internal clock signal;
(3) means for detecting the phase difference between the delayed internal clock signal and the external clock signal and generating an error signal proportional to the phase difference; and
(4) means for actively and continuously adjusting the delay line based on the error signal so that the delayed internal clock signal has the desired phase difference relative to the external clock signal.

14. The memory device of claim of 1,
wherein the bus interface has an input coupled to receive an external clock signal having at least two phases; and
wherein the bus interface comprises:
a plurality of I/O pads;
internal clock generation circuitry for receiving the external clock signal and generating at least two internal clocks, each internal clock having a phase associated with one of the phases of the external clock signal;
a plurality of bus drivers, each coupled to an I/O pad and having an input for receiving date to be transferred to the bus during each phase of the external clock signal; and
a plurality of multiplexers coupled to the internal clocks and to each bus driver, the multiplexer selectively coupling data information in response to the phases of the internal clocks to the input of the bus drivers.

15. A method of communicating over a bus in a memory subsystem, the method comprising the steps of:
(1) a master transmitting on the bus a transaction request,
wherein the bus comprises a group of signaling lines that carry substantially all information necessary for a single memory device to receive a transaction request, including a memory transaction request,
wherein the signaling lines carry substantially all Information necessary for a single memory device to respond to the transaction request, and
wherein the number of signaling lines is substantially less than the number of bits in the memory transaction request;
wherein memory device selection information is time-multiplexed on the bus with other memory transaction request information;
(2) the memory device receiving the transaction request transmitted on the signaling lines;
(3) the memory device determining whether it should respond to the transaction request; and
(4) if the memory device has determined that it should respond to the transaction request, then the memory device responding to the transaction request by receiving data information sent by the master on the signaling lines when receiving data information is necessary to perform a first type of transaction indicated in the transaction request, or by transmitting data information to the master on the signaling lines when transmitting data information is necessary to perform a second type of transaction indicated in the transaction request.

16. The method of claim 15, wherein the transaction request includes address information, and wherein the step of the memory determining whether it should respond to the transaction request comprises determining whether the memory device should respond to the address information in the transaction request.

17. The method of claim 15, wherein the memory device comprises a device identification register whose contents uniquely identify the memory device;
wherein the transaction request further includes device identification information; and
wherein the step of the memory determining whether it should respond to the transaction request comprises determining whether the device identification information in the transaction request matches the device information in the device identification register.

18. The method of claim 17, wherein the device identification information in the transaction request designates all the memory devices coupled to the bus so that all of the memory devices respond to the transaction request.

19. The method of claim 15, wherein the memory device has a plurality of access time registers each loaded with an access time, the access time being a delay time after the receipt of the transaction request at which the memory device must respond to the transaction indicated in the transaction request;
wherein the transaction request further comprises access time information designating an access time register; and
wherein the step of the memory device responding to the transaction request comprises responding at a time specified by the access time register designated by the access time information in the transaction request.

20. The method of claim 19, wherein the master issues transaction requests that designate access times for responses so that possession of the bus is at all times controlled and scheduled by the master to increase bus utilization.

21. The method of claim 15, wherein the transaction request is a register transaction request comprising address information and direction information;
wherein the memory device includes a plurality of control registers, the address information specifying one of the control registers; and
wherein the step of responding to the transaction specified in the transaction request comprises transmitting information from one of the control registers according to the direction information or receiving information from one of the control registers according to the direction information.

22. The method of claim 21, wherein the control register is an access time register, the contents of which determine a delay time at which the memory device must respond to a transaction request.

23. The method of claim 21, wherein the control register is an identification register, the contents of which uniquely identify the memory device.

24. The method of claim 21, wherein the control register is an address range register, the contents of which describe a range of addresses for which the memory device must respond to the transaction request.

25. A memory device, comprising:
(1) at least one memory section comprised of a plurality of memory cells;
(2) a bus interface for coupling the memory section to a bus having a group of signaling lines, wherein the number of signaling lines is substantially less than the number of bits in the information necessary to request a memory transaction to store or retrieve data from the memory cells;
wherein the signaling lines carry substantially all information necessary for a single memory device to receive the memory transaction request;
wherein the signaling lines carry substantially all information necessary for a single memory device to respond to the memory transaction request;
wherein memory device selection information is time-multiplexed on the bus with other memory transaction request information; and
wherein the signaling lines comprise controlled impedance transmission lines.

26. The memory device of claim 25, wherein the transmission lines have substantially the same electrical characteristics.

27. The memory device of claim 25, wherein the bus interface includes a bus driver for each of the signaling lines, wherein each bus driver is coupled between a signaling line and ground and operates as a current source for current having a predetermined value;
wherein a first voltage is present on the signaling line when the current source is not conducting current;
wherein a second voltage is present on the signaling line when the current source is conducting current of the predetermined value;
wherein the difference between the first and second voltage is proportional to the product of the predetermined value of the current of the current source and the impedance of the signaling line; and
wherein the first voltage is determined by a termination resistor coupled between a termination voltage and one end of the signaling line.

28. An apparatus comprising:
(1) at least two semiconductor devices;
(2) a group of signaling lines to which the semiconductor devices are coupled,
wherein the number of signaling lines is substantially less than the number of bits in the information necessary to request a transaction to be performed by a single semiconductor device,
wherein the signaling lines carry substantially all information necessary for a single semiconductor device to receive the transaction request,
wherein the signaling lines carry substantially all information necessary for a single semiconductor device to respond to the transaction request,
wherein the signaling lines comprise transmission lines having substantially the same electrical characteristics and the transmission lines are of a controlled impedance,
wherein the signals on the signaling lines are created from current source drivers; and
(3) at least one clock line for transmission a pair of clock signals to be received by the semiconductor devices,
wherein the clock signals have a clock rate a short cycle time, and a clock cycle having a first phase and a second phase,
wherein information necessary to request and to respond to a transaction is placed on the group of signaling lines in response to the start of the first phase, and
wherein information necessary to request and to respond to a transaction is placed on the group of signaling lines in response to the start of the second phase so that a data information rate is twice the clock rate.

29. An apparatus for storing and retrieving data, the apparatus comprising:
(1) a multiline bus for transmitting address information, control information, and data, wherein the multiline bus has a total number of lines less than a total number of bits in any single address and wherein the control information includes information for selecting memories;
(2) a master coupled to the multiline bus for (A) initiating a read operation by placing on the multiline bus a read request packet comprising (i) read control information comprising a read device identifier, a read cycle type, and a maximum data transfer length indicator and (ii) read address information comprising a read starting address, and for (B) initiating a write operation by placing on the multiline bus a (i) write request packet comprising (a) write control information comprising a write device identifier and a write cycle type and (b) write address information comprising a write starting address and (ii) write data
(3) a first memory coupled to the multiline bus for responding to the read request packet from the master by placing onto the multiline bus data returned from a location within the first memory beginning at the read starting address if the read identifier matches an identifier of the first memory;
(4) a second memory coupled to the multiline bus for responding to the write request packet from the master by storing in the second memory at the write starting address the write data if the write device identifier matches an identifier of the second memory.

30. The apparatus of claim 29, further comprising a clock coupled to the master, the first memory, and the second memory.

31. The apparatus of claim 29, wherein the multiline bus does not have dedicated memory select lines for selecting memories.

32. The apparatus of claim 29, wherein the master is a central processing unit.

33. An apparatus for high-speed access to blocks of data according to a synchronous, split transaction, block-oriented protocol, the apparatus comprising:
(1) a multiline bus for transmitting address information, control information, and blocks of data, wherein the multiline bus has a total number of lines less than the total number of bits in a single address, wherein the control information includes information for selecting memories;
(2) a clock running at a clock rate;
(3) a master coupled to the clock and to the multiline bus for (A) initiating a clocked read operation of a first block of data by placing on the multiline bus a read request packet comprising read address and control information and for (B) initiating a clocked write operation of a second block of data by placing on the multiline bus a (I) write request packet comprising write address and control information and the (ii) second block of data;
(4) a first memory coupled to the clock and to the multiline bus for responding to the read request packet from the master by placing onto the multiline bus in a clocked manner the first block of data returned from a location within the first memory if the read request packet identifies the first memory;
(5) a second memory coupled to the clock and to the multiline bus for responding to the write request packet from the master by storing in the second memory the second block of data received from the multiline bus in a clocked manner if the write request packet identifies the second memory.

34. An apparatus for high-bandwidth data transfer, the apparatus comprising:
(1) a multiline bus for transmitting address information, control information, and data, wherein the control information includes information for selecting memories without the use of dedicated memory select lines;
(2) a master coupled to the multiline bus for initiating a read operation by placing on the multiline bus a read request packet and for initiating a write operation by placing on the multiline bus a write request packet followed by write data;
(3) a first memory coupled to the multiline bus via bus interface built into the first memory and residing on a single edge of the first memory, wherein the first memory responds to the read request packet from the master by placing on the multiline bus data returned from a location within the first memory if the read request packet identifies the first memory;
(4) a second memory coupled to the multiline bus via a bus interface built into the second memory and residing on a single edge of the second memory, wherein the second memory responds to the write request packet from the master by storing in the second memory data received from the multiline bus if the write request packet identifies the second memory.

35. The apparatus of claim 34, wherein the bus interface of the first memory and the bus interface of the second memory each includes an electrical interface, address comparison registers, timing registers, and a memory array access path.

36. A method for retrieving data, comprising the steps of:
(1) having a master initiate a read operation by placing onto a multiline bus a read request packet comprising a read device identifier, a read cycle type, a maximum transfer length indicator, and a read starting address;
(2) having a first memory of a plurality of memories coupled to the multiline bus respond to the read request packet from the master by placing onto the multiline bus data returned from a location within the first memory beginning at the read starting address if the read device identifier matches an identifier of the first memory.

37. A method for storing data, comprising the steps of:
(1) having a master initiate a write operation by placing onto a multiline bus (A) a write request packet comprising write device identifier, a write cycle type, and a write starting address and (B) write data; and
(2) having a first memory of a plurality of memories coupled to the multiline bus respond to the write request packet from the master by storing in the first memory at the write starting address the write data if the write device identifier matches an identifier of the first memory.

38. The method of claim 37, further comprising the step of:
having the plurality of memories including the first memory respond to the write request packet from the master by storing in each of the plurality of memories at the write starting address the write data if the write device identifier matches a broadcast identifier.

39. A method for storing and retrieving data, comprising the steps of:
(1) having a master attempt a read operation by placing onto a multiline bus a read request packet comprising a read device identifier, a read cycle type, a maximum transfer length indicator, and a read starting address;
(2) having the master attempt a write operation by placing onto a multiline bus (A) a write request packet comprising a write device identifier, a write cycle type, and a write starting address and (B) write data;
(3) if the read device identifier or write device identifier matches an identifier of a first memory of a plurality of memories coupled to the multiline bus and the first memory cannot respond to the read request packet or the write request packet, then having the first memory place onto the multiline bus a retry message request that the master retry a request.

40. An apparatus for high-bandwidth data transfer, the apparatus comprising:
(1) a multiline bus for transmitting address information, control information, and data, wherein the multiline bus has a total number of lines less than a total number of bits in any single address, wherein the control information includes information for selecting memories:
(2) a master coupled to the multiline bus for (A) initiating a read operation by placing on the multiline bus a read request packet comprising read address and control information and for (B) initiating a write operation by placing on the multiline bus (i) a write request packet comprising write address and control information and (ii) write data;
(3) a first memory coupled to the multiline bus for responding to the read request packet from the master by placing onto the multiline bus data returned from a location within the first memory if the read request packet identifies the first memory;
(4) a second memory coupled to the multiline bus for responding to the write request packet from the master by storing in the second memory data received from the multiline bus if the write request packet identifies the second memory; and
(5) timing circuitry for ensuring that the first memory places data onto the multiline bus at a predetermined time.

41. An apparatus comprising;
(1) a master coupled to a bus for initiating a response by placing on the bus a request packet; and
(2) a slave coupled to the bus, wherein the slave includes an access-time register that stores a value that determines a timing of a response by the slave to the request packet.

42. The apparatus of claim 41, wherein the access-time register is coupled to the bus and wherein the value stored in the access-time register is settable by the master via a bus transaction.

43. The apparatus of claim 41, wherein the value stored in the access time register is hardwired.

44. An apparatus comprising:
(1) a slave coupled to a bus, wherein the slave includes a plurality of selectable access-time registers storing a plurality of timing values, wherein a timing value stored by an access-time register selected from the plurality of access-time registers determines a timing of a response by the slave to a request packet;
(2) a master coupled to a bus for initiating the response by the slave by placing on the bus the request packet, wherein the request packet includes code for selecting one of the plurality of access-time registers.

45. The apparatus of claim 44, wherein the plurality of access-time registers are coupled to the bus and wherein the plurality of timing values are settable by the master via a bus transaction.

46. The apparatus of claim 44, wherein one of the plurality of timing values is hardwired and another of the plurality of timing values is settable by the master via a bus transaction.

47. A memory device, comprising:
(1) at least one memory section comprised of a plurality of memory cells;
(2) a bus interface for coupling the memory section to a bus comprised a group of signaling lines, wherein the number of signaling lines is substantially less than the number of bits in a memory transaction request to store or retrieve data from the memory cells;
wherein the signaling lines carry substantially all Information necessary for a single memory device to receive the memory transaction request;
wherein the signaling lines carry substantially all information necessary for a single memory device to respond to the memory transaction request;
wherein substantially all information necessary for a single memory device to receive the memory transaction request includes address information and control information;
wherein the control information includes memory device selection information; and
wherein the control information is time multiplexed with the address information over the group of signaling lines.

48. A method of communicating over a bus to a memory device, the bus comprising a group of signaling lines, the method comprising the steps of:
(1) a master transmitting on the bus a transaction request, the bus carrying substantially all address, data, and control information needed by the memory device to receive and respond to the transaction request and having substantially fewer signaling lines than the number of bits in the address information, the control information including memory device selection information and being time-multiplexed with address information over the group of signaling lines;
(2) the memory device receiving the transaction request transmitted on the signaling lines; and
(3) the memory device responding to the received transaction request according to the information in the transaction request.

49. An apparatus comprising:
(1) at least two semiconductor devices, each having a bus interface; and
(2) a bidirectional bus comprising a group of signaling lines and coupling to the bus Interfaces of the semiconductor devices, wherein the number of signaling lines is substantially less than the number of bits in the information necessary for one device to request a transaction to be performed by the other device, wherein the signaling lines carry substantially all information necessary for the semiconductor device to receive and respond to the transaction request, and wherein the transaction request includes device selection information time-multiplexed on the bus with other transaction request information.

50. A memory device comprising:
at least one memory section comprised of a plurality of memory cells;
a synchronous bus interface coupling the memory section to a bus comprising a group of signaling lines, the synchronous bus interface having an input coupled to receive an external clock signal and receiving a transaction request on the bus synchronously with the external clock signal, the synchronous bus interface receiving or transferring data information on the bus synchronously with the external clock signal in response to the transaction request; and
wherein the bus carries substantially all address, data, and control information needed by the memory device to receive and respond to the transaction request and the bus has substantially fewer bus lines than the number of bits in the address information.

51. The memory device of claim 50, wherein the control information is time-multiplexed with the address information over the group of signaling lines.

52. A memory device comprising:
at least one dynamic memory section comprised of a plurality of memory cells; and
a synchronous bus interface coupling the memory section to a bus comprising a group of signaling lines, the synchronous bus interface having an input coupled to receive an external clock signal, wherein the interface receives control and address Information on the bus synchronously with the external clock signal, the control information being time-multiplexed with address information over the group of signaling lines and wherein the bus has substantially fewer bus lines than the number of bits in the address information.

53. The memory device of claim 52, wherein the external clock signal has a cycle with at least two phases, the transaction request comprising information that is valid on each phase of the external clock signal.

54. The memory device of claim 52, the external clock signal having a cycle with at least two phases, the memory device response comprising information that is valid on each phase of the external clock signal.

55. The memory device of claim 52, wherein the bus interface includes:
internal clock generation circuitry receiving the external clock signal and generating a delay-locked internal clock with a desired phase difference between the external clock signal and the internal clock, wherein the internal clock generation circuitry comprises:
a clock receiver receiving the external clock signal and generating an internal clock;
a delay line for generating a delayed internal clock from the internal clock in response to a delay adjustment signal;
circuitry for determining a phase difference signal between the delayed internal clock and external clock signal; and
circuitry for generating the delay adjustment signal to adjust the delay line in response to the phase difference signal so that the delayed internal clock becomes delay-locked with the desired phase difference relative to the external clock.

56. The memory device of claim 52, wherein the memory device further comprises at least one access time register whose contents indicates a delay time after which the memory device must respond to a transaction request.

57. An apparatus comprising:
at least two semiconductor devices each having a bus interface; and
a bidirectional bus comprising a group of signaling lines and coupling to the bus interfaces of the semiconductor devices;
wherein the number of signaling lines is substantially less than the number of bits in the information necessary for one device to request a transaction to be performed by the other device;
wherein the signaling lines carry substantially all information necessary for the semiconductor device to receive and respond to the transaction request; and
wherein the transaction request includes device selection information time-multiplexed on the bus with other transaction request information.

Description

FIELD OF THE INVENTION

An integrated circuit bus interface for computer and video systems is described which allows high speed transfer of blocks of data, particularly to and from memory devices, with reduced power consumption and increased system reliability. A new method of physically implementing the bus architecture is also described.

BACKGROUND OF THE INVENTION

Semiconductor computer memories have traditionally been designed and structured to use one memory device for each bit, or small group of bits, of any individual computer word, where the word size is governed by the choice of computer. Typical word sizes range from 4 to 64 bits. Each memory device typically is connected in parallel to a series of address lines and connected to one of a series of data lines. When the computer seeks to read from or write to a specific memory location, an address is put on the address lines and some or all of the memory devices are activated using a separate device select line for each needed device. One or more devices may be connected to each data line but typically only a small number of data lines are connected to a single memory device. Thus data line 0 is connected to device(s) 0, data line 1 is connected to device(s) 1, and so on. Data is thus accessed or provided in parallel for each memory read or write operation. For the system to operate properly, every single memory bit in every memory device must operate dependably and correctly.

To understand the concept of the present invention, it is helpful to review the architecture of conventional memory devices. Internal to nearly all types of memory devices (including the most widely used Dynamic Random Access Memory (DRAM), Static RAM (SRAM) and Read Only Memory (ROM) devices), a large number of bits are accessed in parallel each time the system carries out a memory access cycle. However, only a small percentage of accessed bits which are available internally each time the memory device is cycled ever make it across the device boundary to the external world.

Referring to FIG. 1, all modern DRAM, SRAM and ROM designs have internal architectures with row (word) lines 5 and column (bit) lines 6 to allow the memory cells to tile a two dimensional area 1. One bit of data is stored at the intersection of each word and bit line. When a particular word line is enabled, all of the corresponding data bits are transferred onto the bit lines. Some prior art DRAMs take advantage of this organization to reduce the number of pins needed to transmit the address. The address of a given memory cell is split into two addresses, row and column, each of which can be multiplexed over a bus only half as wide as the memory cell address of the prior art would have required.

COMPARISON WITH PRIOR ART

Prior art memory systems have attempted to solve the problem of high speed access to memory with limited success. U.S. Pat. No. 3,821,715 (Hoff et. al.), was issued to Intel Corporation for the earliest 4-bit microprocessor. That patent describes a bus connecting a single central processing unit (CPU) with multiple RAMs and ROMs. That bus multiplexes addresses and data over a 4-bit wide bus and uses point-to-point control signals to select particular RAMs or ROMs. The access time is fixed and only a single processing element is permitted. There is no block-mode type of operation, and most important, not all of the interface signals between the devices are bused (the ROM and RAM control lines and the RAM select lines are point-to-point).

In U.S. Pat. No. 4,315,308 (Jackson), a bus connecting a single CPU to a bus interface unit is described. The invention uses multiplexed address, data, and control information over a single 16-bit wide bus. Block-mode operations are defined, with the length of the block sent as part of the control sequence. In addition, variable access-time operations using a "stretch" cycle signal are provided. There are no multiple processing elements and no capability for multiple outstanding requests, and again, not all of the interface signals are bused.

In U.S. Pat. No. 4,449,207 (Kung, et. al.), a DRAM is described which multiplexes address and data on an internal bus. The external interface to this DRAM is conventional, with separate control, address and data connections.

In U.S. Pat. Nos. 4,764,846 and 4,706,166 (Go), a 3-D package arrangement of stacked die with connections along a single edge is described. Such packages are difficult to use because of the point-so-point wiring required to interconnect conventional memory devices with processing elements. Both patents describe complex schemes for solving these problems. No attempt is made to solve the problem by changing the interface.

In U.S. Pat. No. 3,969,706 (Proebsting, et. al.), the current state-of-the-art DRAM interface is described. The address is two-way multiplexed, and there are separate pins for data and control (RAS, CAS, WE, CS). The number of pins grows with the size of the DRAM, and many of the connections must be made point-to-point in a memory system using such DRAMs.

There are many backplane buses described in the prior art, but not in the combination described or having the features of this invention. Many backplane buses multiplex addresses and data on a single bus (e.g., the NU bus). ELXSI and others have implemented split-transaction buses (U.S. Pat. Nos. 4,595,923 and 4,481,625 (Roberts)). ELXSI has also implemented a relatively low-voltage-swing current-mode ECL driver (approximately 1 V swing). Address-space registers are implemented on most backplane buses, as is some form of block mode operation.

Nearly all modern backplane buses implement some type of arbitration scheme, but the arbitration scheme used in this invention differs from each of these. U.S. Pat. No. 4,837,682 (Culler), U.S. Pat. No. 4,818,985 (Ikeda), U.S. Pat. No.
4,779,089 (Theus) and U.S. Pat. No. 4,745,548 (Blahut) describe prior art schemes. All involve either log N extra signals, (Theus, Blahut), where N is the number of potential bus requesters, or additional delay to get control of the bus (Ikeda, Culler). None of the buses described in patents or other literature use only bused connections. All contain some point-to-point connections on the backplane. None of the other aspects of this invention such as power reduction by fetching each data block from a single device or compact and low-cost 3-D packaging even apply to backplane buses.

The clocking scheme used in this invention has not been used before and in face would be difficult to implement in backplane buses due to the signal degradation caused by connector stubs. U.S. Pat. No. 4,247,817 (Heller) describes a clocking scheme using two clock lines, but relies on ramp-shaped clock signals in contrast to the normal rise-time signals used in the present invention.

In U.S. Pat. No. 4,646,270 (Voss), a video RAM is described which implements a parallel-load, serial-out shift register on the output of a DRAM. This generally allows greatly improved bandwidth (and has been extended to 2, 4 and greater width shift-out paths.) The rest of the interfaces to the DRAM (RAS, CAS, multiplexed address, etc.) remain the same as for conventional DRAMS.

One object of the present invention is to use a new bus interface built into semiconductor devices to support high-speed access to large blocks of data from a single memory device by an external user of the data, such as a microprocessor, in an efficient and cost-effective manner.

Another object of this invention is to provide a clocking scheme to permit high speed clock signals to be sent along the bus with minimal clock skew between devices.

Another object of this invention is to allow mapping out defective memory devices or portions of memory devices.

Another object of this invention is to provide a method for distinguishing otherwise identical devices by assigning a unique identifier to each device.

Yet another object of this invention is to provide a method for transferring address, data and control information over a relatively narrow bus and to provide a method of bus arbitration when multiple devices seek to use the bus simultaneously.

Another object of this invention is to provide a method of distributing a high-speed memory cache within the DRAM chips of a memory system which is much more effective than previous cache methods.

Another object of this invention is to provide devices, especially DRAMs, suitable for use with the bus architecture of the invention.

SUMMARY OF INVENTION

The present invention includes a memory subsystem comprising at least two semiconductor devices, including at least one memory device, connected in parallel to a bus, where the bus includes a plurality of bus lines for carrying substantially all address, data and control information needed by said memory devices, where the control information includes device-select information and the bus has substantially fewer bus lines than the number of bits in a single address, and the bus carries device-select information without the need for separate device-select lines connected directly to individual devices.

Referring to FIG. 2, a standard DRAM 13, 14, ROM (or SRAM) 12, microprocessor CPU 11, I/O device, disk controller or other special purpose device such as a high speed switch is modified to use a wholly bus-based interface rather than the prior art combination of point-to-point and bus-based wiring used with conventional versions of these devices. The new bus includes clock signals, power and multiplexed address, data and control signals. In a preferred implementation, 8 bus data lines and an AddressValid bus line carry address, data and control information for memory addresses up to 40 bits wide. Persons skilled in the art will recognize that 16 bus data lines or other numbers of bus data lines can be used to implement the teaching of this invention. The new bus is used to connect elements such as memory, peripheral, switch and processing units.

In the system of this invention, DRAMs and other devices receive address and control information over the bus and transmit or receive requested data over the same bus. Each memory device contains only a single bus interface with no other signal pins. Other devices that may be included in the system can connect to the bus and other non-bus lines, such as input/output lines. The bus supports large data block transfers and split transactions to allow a user to achieve high bus utilization. This ability to rapidly read or write a large block of data to one single device at a time is an important advantage of this invention.

The DRAMs that connect to this bus differ from conventional DRAMs in a number of ways. Registers are provided which may store control information, device identification, device-type and other information appropriate for the chip such as the address range for each independent portion of the device. New bus interface circuits must be added and the internals of prior art DRAM devices need to be modified so they can provide and accept data to and from the bus at the peak data rate of the bus. This requires changes to the column access circuitry in the DRAM, with only a minimal increase in die size. A circuit is provided to generate a low skew internal device clock for devices on the bus, and other circuits provide for demultiplexing input and multiplexing output signals.

High bus bandwidth is achieved by running the bus at a very high clock rate (hundreds of MHz). This high clock rate is made possible by the constrained environment of the bus. The bus lines are controlled-impedance, doubly-terminated lines. For a data rate of 500 MHz, the maximum bus propagation time is less than 1 ns (the physical bus length is about 10 cm). In addition, because of the packaging used, the pitch of the pins can be very close to the pitch of the pads. The loading on the bus resulting from the individual devices is very small. In a preferred implementation, this generally allows stub capacitances of 1-2 pF and inductances of 0.5-2 nH. Each device 15, 16, 17, shown in FIG. 3, only has pins on one side and these pins connect directly to the bus 18. A transceiver device 19 can be included to interface multiple units to a higher order bus through pins 20.

A primary result of the architecture of this invention is to increase the bandwidth of DRAM access. The invention also reduces manufacturing and production costs, power consumption, and increases packing density and system reliability.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagram which illustrates the basic 2-D organization of memory devices.

FIG. 2 is a schematic block diagram which illustrates the parallel connection of all bus lines and the serial Reset line to each device in the system.

FIG. 3 is a perspective view of a system of the invention which illustrates the 3-D packaging of semiconductor devices on the primary bus.

FIG. 4 shows the format of a request packet.

FIG. 5 shows the format of a retry response from a slave.

FIG. 6 shows the bus cycles after a request packet collision occurs on the bus and how arbitration is handled.

FIG. 7 shows the timing whereby signals from two devices can overlap temporarily and drive the bus at the same time.

FIG. 8 shows the connection and timing between bus clocks and devices on the bus.

FIG. 9 is a perspective view showing how transceivers can be used to connect a number of bus units to a transceiver bus.

FIG. 10 is a block and schematic diagram of input/output circuitry used to connect devices to the bus.

FIG. 11 is a schematic diagram of a clocked sense-amplifier used as a bus input receiver.

FIG. 12 is a block diagram showing how the internal device clock is generated from two bus clock signals using a set of adjustable delay lines.

FIG. 13 is a timing diagram showing the relationship of signals in the block diagram of FIG. 12.

FIG. 14 is timing diagram of a preferred means of implementing the reset procedure of this invention.

FIG. 15 is a diagram illustrating the general organization of a 4 Mbit DRAM divided into 8 subarrays.

DETAILED DESCRIPTION

The present invention is designed to provide a high speed, multiplexed bus for communication between processing devices and memory devices and to provide devices adapted for use in the bus system. The invention can also be used to connect processing devices and other devices, such as I/O interfaces or disk controllers, with or without memory devices on the bus. The bus consists of a relatively small number of lines connected in parallel to each device on the bus. The bus carries substantially all address, data and control information needed by devices for communication with other devices on the bus. In many systems using the present invention, the bus carries almost every signal between every device in the entire system. There is no need for separate device-select lines since device-select information for each device on the bus is carried over the bus. There is no need for separate address and data lines because address and data information can be sent over the same lines. Using the organization described herein, very large addresses (40 bits in the preferred implementation) and large data blocks (1024 bytes) can be sent over a small number of bus lines (8 plus one control line in the preferred implementation).

Virtually all of the signals needed by a computer system can be sent over the bus. Persons skilled in the art recognize that certain devices, such as CPUs, may be connected to other signal lines and possibly to independent buses, for example a bus to an independent cache memory, in addition to the bus of this invention. Certain devices, for example cross-point switches, could be connected to multiple, independent buses of this invention. In the preferred implementation, memory devices are provided that have no connections other than the bus connections described herein and CPUs are provided that use the bus of this invention as the principal, if not exclusive, connection to memory and to other devices on the bus.

All modern DRAM, SRAM and ROM designs have internal architectures with row (word) and column (bit) lines to efficiently tile a 2-D area. Referring to FIG. 1, one bit of data is stored at the intersection of each word line 5 and bit line 6. When a particular word line is enabled, all of the corresponding data bits are transferred onto the bit lines. This data, about 4000 bits at a time in a 4 MBit DRAM, is then loaded into column sense amplifiers 3 and held for use by the I/O circuits.

In the invention presented here, the data from the sense amplifiers is enabled 32 bits at a time onto an internal device bus running at approximately 125 MHz. This internal device bus moves the data to the periphery of the devices where the data is multiplexed into an 8-bit wide external bus interface, running at approximately 500 MHz.

The bus architecture of this invention connects master or bus controller devices, such as CPUs, Direct Memory Access devices (DMAs) or Floating Point Units (FPUs), and slave devices, such as DRAM, SRAM or ROM memory devices. A slave device responds to control signals; a master sends control signals. Persons skilled in the art realize that some devices may behave as both master and slave at various times, depending on the mode of operation and the state of the system. For example, a memory device will typically have only slave functions, while a DMA controller, disk controller or CPU may include both slave and master functions. Many other semiconductor devices, including I/O devices, disk controllers, or other special purpose devices such as high speed switches can be modified for use with the bus of this invention.

Each semiconductor device contains a set of internal registers, preferably including a device identification (device ID) register, a device-type descriptor register, control registers and other registers containing other information relevant to that type of device. In a preferred implementation, semiconductor devices connected to the bus contain registers which specify the memory addresses contained within that device and access-time registers which store a set of one or more delay times at which the device can or should be available to send or receive data.

Most of these registers can be modified and preferably are set as part of an initialization sequence that occurs when the system is powered up or reset. During the initialization sequence each device on the bus is assigned a unique device ID number, which is stored in the device ID register. A bus master can then use these device ID numbers to access and set appropriate registers in other devices, including access-time registers, control registers, and memory registers, to configure the system. Each slave may have one or several access-time registers (four in a preferred embodiment). In a preferred embodiment, one access-time register in each slave is permanently or semi-permanently programmed with a fixed value to facilitate certain control functions. A preferred implementation of an initialization sequence is described below in more detail.

All information sent between master devices and slave devices is sent over the external bus, which, for example, may be 8 bits wide. This is accomplished by defining a protocol whereby a master device, such as a microprocessor, seizes exclusive control of the external bus (i.e., becomes the bus master) and initiates a bus transaction by sending a request packet (a sequence of bytes comprising address and control information) to one or more slave devices on the bus. An address can consist of 16
to 40 or more bits according to the teachings of this invention. Each slave on the bus must decode the request packet to see if that slave needs to respond to the packet. The slave that the packet is directed to must then begin any internal processes needed to carry out the requested bus transaction at the requested time. The requesting master may also need to transact certain internal processes before the bus transaction begins. After a specified access time the slave(s) respond by returning one or more bytes (8 bits) of data or by storing information made available from the bus. More than one access time can be provided to allow different types of responses to occur at different times.

A request packet and the corresponding bus access are separated by a selected number of bus cycles, allowing the bus to be used in the intervening bus cycles by the same or other masters for additional requests or brief bus accesses. Thus multiple, independent accesses are permitted, allowing maximum utilization of the bus for transfer of short blocks of data. Transfers of long blocks of data use the bus efficiently even without overlap because the overhead due to bus address, control and access times is small compared to the total time to request and transfer the block.

Device Address Mapping

Another unique aspect of this invention is that each memory device is a complete, independent memory subsystem with all the functionality of a prior art memory board in a conventional backplane-bus computer system. Individual memory devices may contain a single memory section or may be subdivided into more than one discrete memory section. Memory devices preferably include memory address registers for each discrete memory section. A failed memory device (or even a subsection of a device) can be "mapped out" with only the loss of a small fraction of the memory, maintaining essentially full system capability. Mapping out bad devices can be accomplished in two ways, both compatible with this invention.

The preferred method uses address registers in each memory device (or independent discrete portion thereof) to store information which defines the range of bus addresses to which this memory device will respond. This is similar to prior art schemes used in memory boards in conventional backplane bus systems. The address registers can include a single pointer, usually pointing to a block of known size, a pointer and a fixed or variable block size value or two pointers, one pointing to the beginning and one to the end (or to the "top" and "bottom") of each memory block. By appropriate settings of the address registers, a series of functional memory devices or discrete memory sections can be made to respond to a contiguous range of addresses, giving the system access to a contiguous block of good memory, limited primarily by the number of good devices connected to the bus. A block of memory in a first memory device or memory section can be assigned a certain range of addresses, then a block of memory in a next memory device or memory section can be assigned addresses starting with an address one higher (or lower, depending on the memory structure) than the last address of the previous block.

Preferred devices for use in this invention include device-type register information specifying the type of chip, including how much memory is available in what configuration on that device. A master can perform an appropriate memory test, such as reading and writing each memory cell in one or more selected orders, to test proper functioning of each accessible discrete portion of memory (based in part on information like device ID number and device-type) and write address values (up to 40 bits in the preferred embodiment, 10.sup.12 bytes), preferably contiguous, into device address-space registers. Non-functional or impaired memory sections can be assigned a special address value which the system can interpret to avoid using that memory.

The second approach puts the burden of avoiding the bad devices on the system master or masters. CPUs and DMA controllers typically have some sort of translation look-aside buffers (TLBs) which map virtual to physical (bus) addresses. With relatively simple software, the TLBs can be programmed to use only working memory (data structures describing functional memories are easily generated). For masters which don't contain TLBs (for example, a video display generator), a small, simple RAM can be used to map a contiguous range of addresses onto the addresses of the functional memory devices.

Either scheme works and permits a system to have a significant percentage of non-functional devices and still continue to operate with the memory which remains. This means that systems built with this invention will have much improved reliability over existing systems, including the ability to build systems with almost no field failures.

Bus

The preferred bus architecture of this invention comprises 11 signals: BusData[0:7]; AddrValid; Clk1 and Clk2; plus an input reference level and power and ground lines connected in parallel to each device. Signals are driven onto the bus during conventional bus cycles. The notation "Signal[i:j]" refers to a specific range of signals or lines, for example, BusData[0:7] means BusData0, BusData1, . . . , BusData7. The bus lines for BusData[0:7] signals form a byte-wide, multiplexed data/address/control bus. AddrValid is used to indicate when the bus is holding a valid address request, and instructs a slave to decode the bus data as an address and, if the address is included on that slave, to handle the pending request. The two clocks together provide a synchronized, high speed clock for all the devices on the bus. In addition to the bused signals, there is one other line (ResetIn, ResetOut) connecting each device in series for use during initialization to assign every device in the system a unique device ID number (described below in detail).

To facilitate the extremely high data rate of this external bus relative to the gate delays of the internal logic, the bus cycles are grouped into pairs of even/odd cycles. Note that all devices connected to a bus should preferably use the same even/odd labeling of bus cycles and preferably should begin operations on even cycles. This is enforced by the clocking scheme.

Protocol and Bus Operation

The bus uses a relatively simple, synchronous, split-transaction, block-oriented protocol for bus transactions. One of the goals of the system is to keep the intelligence concentrated in the masters, thus keeping the slaves as simple as possible (since there are typically many more slaves than masters). To reduce the complexity of the slaves, a slave should preferably respond to a request in a specified time, sufficient to allow the slave to begin or possibly complete a device-internal phase including any internal actions that must precede the subsequent bus access phase. The time for this bus access phase is known to all devices on the bus--each master being responsible for making sure that the bus will be free when the bus access begins. Thus the slaves never worry about arbitrating for the bus. This approach eliminates arbitration in single master systems, and also makes the slave-bus interface simpler.

In a preferred implementation of the invention, to initiate a bus transfer over the bus, a master sends out a request packet, a contiguous series of bytes containing address and control information. It is preferable to use a request packet containing an even number of bytes and also preferable to start each packet on an even bus cycle.

The device-select function is handled using the bus data lines. AddrValid is driven, which instructs all slaves to decode the request packet address, determine whether they contain the requested address, and if they do, provide the data back to the master (in the case of a read request) or accept data from the master (in the case of a write request) in a data block transfer. A master can also select a specific device by transmitting a device ID number in a request packet. In a preferred implementation, a special device ID number is chosen to indicate that the packet should be interpreted by all devices on the bus. This allows a master to broadcast a message, for example to set a selected control register of all devices with the same value.

The data block transfer occurs later at a time specified in the request packet control information, preferably beginning on an even cycle. A device begins a data block transfer almost immediately with a device-internal phase as the device initiates certain functions, such as setting up memory addressing, before the bus access phase begins. The time after which a data block is driven onto the bus lines is selected from values stored in slave access-time registers. The timing of data for reads and writes is preferably the same; the only difference is which device drives the bus. For reads, the slave drives the bus and the master latches the values from the bus. For writes the master drives the bus and the selected slave latches the values from the bus.

In a preferred implementation of this invention shown in FIG. 4, a request packet 22 contains 6 bytes of data--4.5 address bytes and 1.5 control bytes. Each request packet uses all nine bits of the multiplexed data/address lines (AddrValid 23
+BusData[0:7] 24) for all six bytes of the request packet. Setting 23 AddrValid=1 in an otherwise unused even cycle indicates the start of an request packet (control information). In a valid request packet, AddrValid 27 must be 0 in the last byte. Asserting this signal in the last byte invalidates the request packet. This is used for the collision detection and arbitration logic (described below). Bytes 25-26 contain the first 35 address bits, Address[0:35]. The last byte contains AddrValid 27
(the invalidation switch) and 28, the remaining address bits, Address[36:39], and BlockSize[0:3] (control information).

The first byte contains two 4 bit fields containing control information, AccessType[0:3], an op code (operation code) which, for example, specifies the type of access, and Master[0:3], a position reserved for the master sending the packet to include its master ID number. Only master numbers 1 through 15 are allowed--master number 0 is reserved for special system commands. Any packet with Master[0:3] =0 is an invalid or special packet and is treated accordingly.

The AccessType field specifies whether the requested operation is a read or write and the type of access, for example, whether it is to the control registers or other parts of the device, such as memory. In a preferred implementation, AccessType[0] is a Reed/Write switch: if it is a 1, then the operation calls for a read from the slave (the slave to read the requested memory block and drive the memory contents onto the bus); if it is a 0, the operation calls for a write into the slave (the slave to read data from the bus and write it to memory). AccessType[1:3] provides up to 8 different access types for a slave. AccessType[1:2] preferably indicates the timing of the response, which is stored in an access-time register, AccessRegN. The choice of access-time register can be selected directly by having a certain op code select that register, or indirectly by having a slave respond to selected op codes with pre-selected access times (see table below). The remaining bit, AccessType[3] may be used to send additional information about the request to the slaves.

One special type of access is control register access, which involves addressing a selected register in a selected slave. In the preferred implementation of this invention, AccessType[1:3] equal to zero indicates a control register request and the address field of the packet indicates the desired control register. For example, the most significant two bytes can be the device ID number (specifying which slave is being addressed) and the least significant three bytes can specify a register address and may also represent or include data to be loaded into that control register. Control register accesses are used to initialize the access-time registers, so it is preferable to use a fixed response time which can be preprogrammed or even hard wired, for example the value in AccessReg0, preferably 8 cycles. Control register access can also be used to initialize or modify other registers, including address registers.

The method of this invention provides for access mode control specifically for the DRAMs. One such access mode determines whether the access is page mode or normal RAS access. In normal mode (in conventional DRAMS and in this invention), the DRAM column sense amps or latches have been precharged to a value intermediate between logical 0 and 1. This precharging allows access to a row in the RAM to begin as soon as the access request for either inputs (writes) or outputs (reads) is received and allows the column sense amps to sense data quickly. In page mode (both conventional and in this invention), the DRAM holds the data in the column sense amps or latches from the previous read or write operation. If a subsequent request to access data is directed to the same row, the DRAM does not need to wait for the data to be sensed (it has been sensed already) and access time for this data is much shorter than the normal access time. Page mode generally allows much faster access to data but to a smaller block of data (equal to the number of sense amps). However, if the requested data is not in the selected row, the access time is longer than the normal access time, since the request must wait for the RAM to precharge before the normal mode access can start. Two access-time registers in each DRAM preferably contain the access times to be used for normal and for page-mode accesses, respectively.

The access mode also determines whether the DRAM should precharge the sense amplifiers or should save the contents of the sense amps for a subsequent page mode access. Typical settings are "precharge after normal access" and "save after page mode access" but "precharge after page mode access" or "save after normal access" are allowed, selectable modes of operation. The DRAM can also be set to precharge the sense amps if they are not accessed for a selected period of time.

In page mode, the data stored in the DRAM sense amplifiers may be accessed within much less time than it takes to read out data in normal mode (.sup..about. 10-20 nS vs. 40-100 nS). This data may be kept available for long periods. However, if these sense amps (and hence bit lines) are not precharged after an access, a subsequent access to a different memory word (row) will suffer a precharge time penalty of about 40-100 nS because the sense amps must precharge before latching in a new value.

The contents of the sense amps thus may be held and used as a cache, allowing faster, repetitive access to small blocks of data. DRAM-based page-mode caches have been attempted in the prior art using conventional DRAM organizations but they are not very effective because several chips are required per computer word. Such a conventional page-mode cache contains many bits (for example, 32 chips.times.4 Kbits) but has very few independent storage entries. In other words, at any given point in time the sense amps hold only a few different blocks or memory "locales" (a single block of 4K words, in the example above). Simulations have shown that upwards of 100 blocks are required to achieve high hit rates (>90% of requests find the requested data already in cache memory) regardless of the size of each block. See, for example, Anant Agarwal, et. al., "An Analytic Cache Model," ACM Transactions on Computer Systems, Vol. 7(2), pp. 184-215 (May 1989).

The organization of memory in the present invention allows each DRAM to hold one or more (4 for 4 MBit DRAMS) separately-addressed and independent blocks of data. A personal computer or workstation with 100 such DRAMs (i.e. 400 blocks or locales) can achieve extremely high, very repeatable hit rates (98-99% on average) as compared to the lower (50-80%), widely varying hit rates using DRAMS organized in the conventional fashion. Further, because of the time penalty associated with the deferred precharge on a "miss" of the page-mode cache, the conventional DRAM-based page-mode cache generally has been found to work less well than no cache at all.

For DRAM slave access, the access types are preferably used in the following way:

______________________________________ AccessType[1:3] Use AccessTime ______________________________________ 0 Control Register Fixed, 8[AccessReg0] Access 1 Unused Fixed, 8[AccessReg0] 2-3 Unused AccessReg1 4-5 Page Mode DRAM AccessReg2 access 6-7 Normal DRAM access AccessReg3 ______________________________________

Persons skilled in the art will recognize that a series of available bits could be designated as switches for controlling these access modes. For example:

______________________________________ AccessType[2] = page mode/normal switch AccessType[3] = precharge/save-data switch ______________________________________

BlockSize[0:3] specifies the size of the data block transfer. If BlockSize[0] is 0, the remaining bits are the binary representation of the block size (0-7). If BlockSize[0] is 1, then the remaining bits give the block size as a binary power of
2, from 8 to 1024. A zero-length block can be interpreted as a special command, for example, to refresh a DRAM without returning any data, or to change the DRAM from page mode to normal access mode or vice-versa.

______________________________________ BlockSize[0:2] Number of Bytes in Block ______________________________________ 0-7 0-7 respectively 8 8 9 16 10 32 11 64 12 128 13 256 14 512 15 1024 ______________________________________

Persons skilled in the art will recognize that other block size encoding schemes or values can be used.

In most cases, a slave will respond at the selected access time by reading or writing data from or to the bus over bus lines BusData[0:7] and AddrValid will be at logical 0. In a preferred embodiment, substantially each memory access will involve only a single memory device, that is, a single block will be read from or written to a single memory device.

Retry Format

In some cases, a slave may not be able to respond correctly to a request, e.g., for a read or write. In such a situation, the slave should return an error message, sometimes called a N(o)ACK(nowledge) or retry message. The retry message can include information about the condition requiring a retry, but this increases system requirements for circuitry in both slave and masters. A simple message indicating only that an error has occurred allows for a less complex slave, and the master can take whatever action is needed to understand and correct the cause of the error.

For example, under certain conditions a slave might not be able to supply the requested data. During a page-mode access, the DRAM selected must be in page mode and the requested address must match the address of the data held in the sense amps or latches. Each DRAM can check for this match during a page-mode access. If no match is found, the DRAM begins precharging and returns a retry message to the master during the first cycle of the data block (the rest of the returned block is ignored). The master then must wait for the precharge time (which is set to accommodate the type of slave in question, stored in a special register, PreChargeReg), and then resend the request as a normal DRAM access (AccessType=6 or 7).

In the preferred form of the present invention, a slave signals a retry by driving AddrValid true at the time the slave was supposed to begin reading or writing data. A master which expected to write to that slave must monitor AddrValid during the write and take corrective action if it detects a retry message. FIG. 5 illustrates the format of a retry message 28 which is useful for read requests, consisting of 23 AddrValid=1 with Master[0:3] =0 in the first (even) cycle. Note that AddrValid is normally 0 for data block transfers and that there is no master 0 (only 1 through 15 are allowed). All DRAMs and masters can easily recognize such a packet as an invalid request packet, and therefore a retry message. In this type of bus transaction all of the fields except for Master[0:3] and AddrValid 23 may be used as information fields, although in the implementation described, the contents are undefined. Persons skilled in the art recognize that another method of signifying a retry message is to add a DataInvalid line and signal to the bus. This signal could be asserted in the case of a NACK.

Bus Arbitration

In the case of a single master, there are by definition no arbitration problems. The master sends request packets and keeps track of periods when the bus will be busy in response to that packet. The master can schedule multiple requests so that the corresponding data block transfers do not overlap.

The bus architecture of this invention is also useful in configurations with multiple masters. When two or more masters are on the same bus, each master must keep track of all the pending transactions, so each master knows when it can send a request packet and access the corresponding data block transfer. Situations will arise, however, where two or more masters send a request packet at about the same time and the multiple requests must be detected, then sorted out by some sort of bus arbitration.

There are many ways for each master to keep track of when the bus is and will be busy. A simple method is for each master to maintain a bus-busy data structure, for example by maintaining two pointers, one to indicate the earliest point in the future when the bus will be busy and the other to indicate the earliest point in the future when the bus will be free, that is, the end of the latest pending data block transfer. Using this information, each master can determine whether and when there is enough time to send a request packet (as described above under Protocol) before the bus becomes busy with another data block transfer and whether the corresponding data block transfer will interfere with pending bus transactions. Thus each master must read every request packet and update its bus-busy data structure to maintain information about when the bus is and will be free.

With two or more masters on the bus, masters will occasionally transmit independent request packets during the same bus cycle. Those multiple requests will collide as each such master drives the bus simultaneously with different information, resulting in scrambled request information and neither desired data block transfer. In a preferred form of the invention, each device on the bus seeking to write a logical 1 on a BusData or AddrValid line drives that line with a current sufficient to sustain a voltage greater than or equal to the high-logic value for the system. Devices do not drive lines that should have a logical 0; those lines are simply held at a voltage corresponding to a low-logic value. Each master tests the voltage on at least some, preferably all, bus data and the AddrValid lines so the master can detect a logical `1` where the expected level is `0` on a line that it does not drive during a given bus cycle but another master does drive.

Another way to detect collisions is to select one or more bus lines for collision signalling. Each master sending a request drives that line or lines and monitors the selected lines for more than the normal drive current (or a logical value of ">1"), indicating requests by more than one master. Persons skilled in the art will recognize that this can be implemented with a protocol involving BusData and AddrValid lines or could be implemented using an additional bus line.

In the preferred form of this invention, each master detects collisions by monitoring lines which it does not drive to see if another master is driving those lines. Referring to FIG. 4, the first byte of the request packet includes the number of each master attempting to use the bus (Master[0:3]). If two masters send packet requests starting at the same point in time, the master numbers will be logical "or"ed together by at least those masters, and thus one or both of the masters, by monitoring the data on the bus and comparing what it sent, can detect a collision. For instance if requests by masters number 2 (0010) and 5 (0101) collide, the bus will be driven with the value Master[0:3]=7 (0010+0101=0111). Master number 5 will detect that the signal Master[2]=1 and master 2 will detect that Master [1] and Master[3] =1, telling both masters that a collision has occurred. Another example is masters 2 and 11, for which the bus will be driven with the value Master[0:3]=11 (0010+1011=1011), and although master 11 can't readily detect this collision, master 2 can. When any collision is detected, each master detecting a collision drives the value of AddrValid 27 in byte 5 of the request packet 22 to 1, which is detected by all masters, including master 11 in the second example above, and forces a bus arbitration cycle, described below.

Another collision condition may arise where master A sends a request packet in cycle 0 and master B tries to send a request packet starting in cycle 2 of the first request packet, thereby overlapping the first request packet. This will occur from time to time because the bus operates at high speeds, thus the logic in a second-initiating master may not be fast enough to detect a request initiated by a first master in cycle 0 and to react fast enough by delaying its own request. Master B eventually notices that it wasn't supposed to try to send a request packet (and consequently almost surely destroyed the address that master A was trying to send), and, as in the example above of a simultaneous collision, drives a 1 on AddrValid during byte 5 of the first request packet 27 forcing an arbitration. The logic in the preferred implementation is fast enough that a master should detect a request packet by another master by cycle 3 of the first request packet, so no master is likely to attempt to send a potentially colliding request packet later than cycle 2.

Slave devices do not need to detect a collision directly, but they must wait to do anything irrecoverable until the last byte (byte 5) is read to ensure that the packet is valid. A request packet with Master[0:3] equal to 0 (a retry signal) is ignored and does not cause a collision. The subsequent bytes of such a packet are ignored.

To begin arbitration after a collision, the masters wait a preselected number of cycles after the aborted request packet (4 cycles in a preferred implementation), then use the next free cycle to arbitrate for the bus (the next available even cycle in the preferred implementation). Each colliding master signals to all other colliding masters that it seeks to send a request packet, a priority is assigned to each of the colliding masters, then each master is allowed to make its request in the order of that priority.

FIG. 6 illustrates one preferred way of implementing this arbitration. Each colliding master signals its intent to send a request packet by driving a single BusData line during a single bus cycle corresponding to its assigned master number (1-15
in the present example). During two-byte arbitration cycle 29, byte 0 is allocated to requests 1-7 from masters 1-7, respectively, (bit 0 is not used) and byte 1 is allocated to requests 8-15 from masters 8-15, respectively. At least one device and preferably each colliding master reads the values on the bus during the arbitration cycles to determine and store which masters desire to use the bus. Persons skilled in the art will recognize that a single byte can be allocated for arbitration requests if the system includes more bus lines than masters. More than 15 masters can be accommodated by using additional bus cycles.

A fixed priority scheme (preferably using the master numbers, selecting lowest numbers first) is then used to prioritize, then sequence the requests in a bus arbitration queue which is maintained by at least one device. These requests are queued by each master in the bus-busy data structure and no further requests are allowed until the bus arbitration queue is cleared. Persons skilled in the art will recognize that other priority schemes can be used, including assigning priority according to the physical location of each master.

System Configuration/Reset

In the bus-based system of this invention, a mechanism is provided to give each device on the bus a unique device identifier (device ID) after power-up or under other conditions as desired or needed by the system. A master can then use this device ID to access a specific device, particularly to set or modify registers of the specified device, including the control and address registers. In the preferred embodiment, one master is assigned to carry out the entire system configuration process. The master provides a series of unique device ID numbers for each unique device connected to the bus system. In the preferred embodiment, each device connected to the bus contains a special device-type register which specifies the type of device, for instance CPU, 4 MBit memory, 64 MBit memory or disk controller. The configuration master should check each device, determine the device type and set appropriate control registers, including access-time registers. The configuration master should check each memory device and set all appropriate memory address registers.

One means to set up unique device ID numbers is to have each device to select a device ID in sequence and store the value in an internal device ID register. For example, a master can pass sequential device ID numbers through shift registers in each of a series of devices, or pass a token from device to device whereby the device with the token reads in device ID information from another line or lines. In a preferred embodiment, device ID numbers are assigned to devices according to their physical relationship, for instance, their order along the bus.

In a preferred embodiment of this invention, the device ID setting is accomplished using a pair of pins on each device, ResetIn and ResetOut. These pins handle normal logic signals and are used only during device ID configuration. On each rising edge of the clock, each device copies ResetIn (an input) into a four-stage reset shift register. The output of the reset shift register is connected to ResetOut, which in turn connects to ResetIn for the next sequentially connected device. Substantially all devices on the bus are thereby daisy-chained together. A first reset signal, for example, while ResetIn at a device is a logical 1, or when a selected bit of the reset shift register goes from zero to non-zero, causes the device to hard reset, for example by clearing all internal registers and resetting all state machines. A second reset signal, for example, the falling edge of ResetIn combined with changeable values on the external bus, causes that device to latch the contents of the external bus into the internal device ID register (Device[0:7]).

To reset all devices on a bus, a master sets the ResetIn line of the first device to a "1" for long enough to ensure that all devices on the bus have been reset (4 cycles times the number of devices--note that the maximum number of devices on the preferred bus configuration is 256 (8 bits), so that 1024 cycles is always enough time to reset all devices.) Then ResetIn is dropped to "0" and the BusData lines are driven with the first followed by successive device ID numbers, changing after every 4
clock pulses. Successive devices set those device ID numbers into the corresponding device ID register as the falling edge of ResetIn propagates through the shift registers of the daisy-chained devices. FIG. 14 shows ResetIn at a first device going low while a master drives a first device ID onto the bus data lines BusData[0:3]. The first device then latches in that first device ID. After four clock cycles, the master changes BusData[0:3] to the next device ID number and ResetOut at the first device goes low, which pulls ResetIn for the next daisy-chained device low, allowing the next device to latch in the next device ID number from BusData[0:3]. In the preferred embodiment, one master is assigned device ID 0 and it is the responsibility of that master to control the ResetIn line and to drive successive device ID numbers onto the bus at the appropriate times. In the preferred embodiment, each device waits two clock cycles after ResetIn goes low before latching in a device ID number from BusData[0:3].

Persons skilled in the art recognize that longer device ID numbers could be distributed to devices by having each device read in multiple bytes from the bus and latch the values into the device ID register. Persons skilled in the art also recognize that there are alternative ways of getting device ID numbers to unique devices. For instance, a series of sequential numbers could be clocked along the ResetIn line and at a certain time each device could be instructed to latch the current reset shift register value into the device ID register.

The configuration master should choose and set an access time in each access-time register in each slave to a period sufficiently long to allow the slave to perform an actual, desired memory access. For example, for a normal DRAM access, this time must be longer than the row address strobe (RAS) access time. If this condition is not met, the slave may not deliver the correct data. The value stored in a slave access-time register is preferably one-half the number of bus cycles for which the slave device should wait before using the bus in response to a request. Thus an access time value of `1` would indicate that the slave should not access the bus until at least two cycles after the last byte of the request packet has been received. The value of AccessReg0 is preferably fixed at 8 (cycles) to facilitate access to control registers.

The bus architecture of this invention can include more than one master device. The reset or initialization sequence should also include a determination of whether there are multiple masters on the bus, and if so to assign unique master ID numbers to each. Persons skilled in the art will recognize that there are many ways of doing this. For instance, the master could poll each device to determine what type of device it is, for example, by reading a special register then, for each master device, write the next available master ID number into a special register.

ECC

Error detection and correction ("ECC") methods well known in the art can be implemented in this system. ECC information typically is calculated for a block of data at the time that block of data is first written into memory. The data block usually has an integral binary size, e.g. 256 bits, and the ECC information uses significantly fewer bits. A potential problem arises in that each binary data block in prior art schemes typically is stored with the ECC bits appended, resulting in a block size that is not an integral binary power.

In a preferred embodiment of this invention, ECC information is stored separately from the corresponding data, which can then be stored in blocks having integral binary size. ECC information and corresponding data can be stored, for example, in separate DRAM devices. Data can be read without ECC using a single request packet, but to write or read error-corrected data requires two request packets, one for the data and a second for the corresponding ECC information. ECC information may not always be stored permanently and in some situations the ECC information may be available without sending a request packet or without a bus data block transfer.

In a preferred embodiment, a standard data block size can be selected for use with ECC, and the ECC method will determine the required number of bits of information in a corresponding ECC block. RAMs containing ECC information can be programmed to store an access time that is equal to: (1) the access time of the normal RAM (containing data) plus the time to access a standard data block (for corrected data) minus the time to send a request packet (6 bytes); or (2) the access time of a normal RAM minus the time to access a standard ECC block minus the time to send a request packet. To read a data block and the corresponding ECC block, the master simply issues a request for the data immediately followed by a request for the ECC block. The ECC RAM will wait for the selected access time then drive its data onto the bus right after (in case (1) above)) the data RAM has finished driving out the data block. Persons skilled in the art will recognize that the access time described in case (2) above can be used to drive ECC data before the data is driven onto the bus lines and will recognize that writing data can be done by analogy with the method described for a read. Persons skilled in the art will also recognize the adjustments that must be made in the bus-busy structure and the request packet arbitration methods of this invention in order to accommodate these paired ECC requests.

Since this system is quite flexible, the system designer can choose the size of the data blocks and the number of ECC bits using the memory devices of this invention. Note that the data stream on the bus can be interpreted in various ways. For instance the sequence can be 2.sup.n data bytes followed by 2.sup.m ECC bytes (or vice versa), or the sequence can be 2.sup.k iterations of 8 data bytes plus 1 ECC byte. Other information, such as information used by a directory-based cache coherence scheme, can also be managed this way. See, for example, Anant Agarwal, et al., "Scaleable Directory Schemes for Cache Consistency," 15th International Symposium on Computer Architecture, June 1988, pp. 280-289. Those skilled in the art will recognize alternative methods of implementing ECC schemes that are within the teachings of this invention.

Low Power 3-D Packaging

Another major advantage of this invention is that it drastically reduces the memory system power consumption. Nearly all the power consumed by a prior art DRAM is dissipated in performing row access. By using a single row access in a single RAM to supply all the bits for a block request (compared to a row-access in each of multiple RAMs in conventional memory systems) the power per bit can be made very small. Since the power dissipated by memory devices using this invention is significantly reduced, the devices potentially can be placed much closer together than with conventional designs.

The bus architecture of this invention makes possible an innovative 3-D packaging technology. By using a narrow, multiplexed (time-shared) bus, the pin count for an arbitrarily large memory device can be kept quite small--on the order of 20
pins. Moreover, this pin count can be kept constant from one generation of DRAM density to the next. The low power dissipation allows each package to be smaller, with narrower pin pitches (spacing between the IC pins). With current surface mount technology supporting pin pitches as low as 20 mils, all off-device connections can be implemented on a single edge of the memory device. Semiconductor die useful in this invention preferably have connections or pads along one edge of the die which can then be wired or otherwise connected to the package pins with wires having similar lengths. This geometry also allows for very short leads, preferably with an effective lead length of less than 4 mm. Furthermore, this invention uses only bused interconnections, i.e., each pad on each device is connected by the bus to the corresponding pad of each other device.

The use of a low pin count and an edge-connected bus permits a simple 3-D package, whereby the devices are stacked and the bus is connected along a single edge of the stack. The fact that all of the signals are bused is importan