United States Patent7185293
OferFebruary 27, 2007

Title

Universal hardware device and method and tools for use therewith

Abstract

A universal hardware device including at least one plurality of cells for storing data; and at least one programmable matrix coupled to the at least one plurality of cells, whereby a plurality of hardware applications may be implemented by selectively storing data in the cells and selectively programming the matrix to connect at least one of the cells to at least one of the cells.


Inventors:Ofer; Meged (Netanya, IL)
Assignee:Cellot, Inc. (Wilmington, DE)
Appl. No.:10/148,458
Filed:November 28, 2000
PCT File Date:November 28, 2000
PCT Pub Date:May 31, 2001

Current U.S. Class:716/1 716/16 716/17 
Current International Class:G06F 17/50 (20060101)
Field of Search:716/1,2,7,16,17

U.S. Patent Documents
4984192January 1991Flynn
5359536October 1994Agrawal et al.
5457410October 1995Ting
5477475December 1995Sample et al.
5526278June 1996Powell
5543640August 1996Sutherland et al.
5621650April 1997Agrawal et al.
5778439July 1998Trimberger et al.
5815715September 1998Kucukcakar
5815726September 1998Cliff
5821773October 1998Norman et al.
5894228April 1999Reddy et al.
5909450June 1999Wright
5911059June 1999Profit, Jr.
5991907November 1999Stroud et al.
6018490January 2000Cliff et al.
6020759February 2000Heile
6026230February 2000Lin et al.
6028809February 2000Schleicher et al.
6031391February 2000Couts-Martin et al.
6034536March 2000McClintock et al.
6058452May 2000Rangasayee et al.
6058492May 2000Sample et al.
6069489May 2000Iwanczuk et al.
6078736June 2000Guccione
6085317July 2000Smith
6091258July 2000McClintock et al.
6097211August 2000Couts-Martin et al.
Foreign Patent Documents
2315 583Feb., 1998GB
2321 989Aug., 1998GB
WO 99/52049Oct., 1999WO
Other References
Mahapatra, N.R., et al, Hardware-Efficient and Highley-Reconfigurable 4- and 2-Track Fault-Tolerant Designs for Mesh-Connected Multicomputers, Proceedings of the 26.sup.th International Symposium on Fault-Tolerant Computing. Sendai, JP., Jun. 25-27, 1996, Proceedings of the International Symposium on Fault-Tolerant Computing, Los Alamitos, IEEE Comp. Soc. Press, Us, vol. Conf. 26, Jun. 25, 1996, pp. 272-281, XP000679291, ISBN: 0-8186-7261-7. cited by other .
http://www.ti.com/sc/docs/asic/cad/cad.htm. cited by other .
http://www.versity.com/html/specbased.html. cited by other .
http://www.wsdmag.com/library/penton/archives/wsd/January1998/261.html. cited by other .
http://www.cstp.umkc.edu/personal/cjweber/spiral.html. cited by other .
U. Tietze, CH. Schenk, "Halbleiter-Schaltungstechnik", 5.sup.th edition, Springer-Verlag Berlin, Heidelberg, New York, 1980, pp. 491, 492. cited by other .
International Preliminary Examination Report for corresponding PCT Application No. PCT/IL00/00797. cited by other.~
Primary Examiner: Lin; Sun James
Attorney, Agent or Firm:Knobbe, Martens, Olson & Bear, LLP

Parent Case Text



RELATED APPLICATIONS

This application claims the benefit of the U.S. provisional application No. 60/167,684 filed Nov. 29, 1999 and the international application PCT/IL00/00797 filed Nov. 28, 2000.

Claims


The invention claimed is:
1. A universal hardware device comprising: a first plurality of cells for storing data; and a first programmable matrix connected to inputs and outputs of the first plurality of cells, wherein at least one cell in the first plurality of cells comprises: a second plurality of cells for storing data; and a second programmable matrix connected to inputs and outputs of the second plurality of cells, whereby each cell of the first plurality of cells and each cell of the second plurality of cells has an architecture similar to the whole universal hardware device, and wherein at least one cell in the second plurality of cells can be directly accessed via a port of the at least one cell in the first plurality of cells and wherein a plurality of hardware applications can be implemented by selectively storing data in the first plurality of cells and the second plurality of cells and by selectively programming the first programmable matrix to connect cells in the first plurality of cells and the second programmable matrix to connect cells in the second plurality of cells.

2. The device according to claim 1, wherein the first programmable matrix and the second programmable matrix are integral.

3. The device according to claim 1, wherein at least some of the stored data is changeable dynamically by one or more of the plurality of hardware applications during execution thereof.

4. The device according to claim 1, further including a memory for storing therein a control setting of at least one of the first programmable matrix and the second programmable matrix.

5. The device according to claim 1, further including at least two memories, each for storing therein a respective setting of the first programmable matrix and/or the second programmable matrix, and a controlled multiplexer for selecting between the at least two memories whereby a topology of the universal hardware device can be rapidly changed by a single control.

6. The device of claim 5, wherein at least some of the stored data is changeable dynamically by one or more of the plurality of hardware applications during execution thereof.

7. The device according to claim 1, having at least one input connected to at least one of the first programmable matrix and the second programmable matrix.

8. The device according to claim 1, having at least one output connected to at least one of the first programmable matrix and the second programmable matrix.

9. The device according to claim 1, wherein at least some of the cells in the first plurality of cells and in the second plurality of cells are synchronized by at least one clock signal each of which can be independently enabled or disabled via at least one of the first programmable matrix and the second programmable matrix.

10. The device according to claim 9, wherein the at least one clock signal is connected to at least one of the first programmable matrix and the second programmable matrix.

11. The device according to claim 1, wherein the cells in the first plurality of cells and in the second plurality of cells are connected by a pre-selected topology and are loaded with data to implement a desired hardware application.

12. The device according to claim 1, wherein at least some of the cells in the first plurality of cells and in the second plurality of cells are formed from constituent elements that are interconnected via at least one of the first programmable matrix and the second programmable matrix and which can be used independently.

13. The device of claim 1, wherein at least some of the cells in the first plurality of cells and in the second plurality of cells are synchronized by at least one clock signal each of which can be independently enabled or disabled via the at least one programmable matrix.

14. The device according to claim 1, wherein at least one of the cells in the first plurality of cells and in the second plurality of cells and the constituent elements are connected by a pre-selected topology and are loaded with data to implement a desired hardware application.

15. The device according to claim 1, wherein at least some of the cells in the first plurality of cells and in the second plurality of cells comprise: a random access memory (RAM) having an address bus for feeding thereto a required address so as to output on a data bus of the RAM a respective data value stored in a respective memory location of the RAM addressed thereby, a respective register coupled to each RAM for latching either the address bus such that an input to the register constitutes an input to the respective cell and an output of the RAM constitutes an output of the respective cell or for latching the data bus of the RAM such that an address to the RAM constitutes an input to the respective cell an output to the register constitutes an output of the respective cell, and auxiliary circuitry for modifying an operating characteristic of the RAM and the register; whereby the at least some of the cells in the first plurality of cells and in the second plurality of cells at least partially operate as a lookup table.

16. The device according to claim 15, wherein at least some of the cells in the first plurality of cells and in the second plurality of cells are used to provide timesharing capability.

17. The device according to claim 16 wherein at least one of the at least some of the cells in the first plurality of cells and in the second plurality of cells are used as a timesharing counter and remaining ones of the at least some of the cells in the first plurality of cells and in the second plurality of cells are used to store an instantaneous state of a respective implementation of the device.

18. An assembly comprising: at least one device comprising a first plurality of cells for storing data; and a first programmable matrix connected to inputs and outputs of the first plurality of cells, wherein at least one cell in the first plurality of cells comprises: a second plurality of cells for storing data; and a second programmable matrix connected to inputs and outputs of the second plurality of cells, whereby each cell of the first plurality of cells and each cell of the second plurality of cells has an architecture similar to the whole device, and wherein at least one cell in the second plurality of cells can be directly accessed via a port of the at least one cell in the first plurality of cells and each of the at least one devices having an active and an inactive state, and wherein a host is coupled to the at least one device and has a memory which stores therein respective formatted data that must be loaded into each of the at least one device so as to allow the respective device to carry out a required operation when in the active state or to allow the host to load the respective device when in the inactive state and wherein a plurality of hardware applications can be implemented by selectively storing data in the first plurality of cells and the second plurality of cells and by selectively programming the first programmable matrix to connect cells in the first plurality of cells and the second programmable matrix to connect cells in the second plurality of cells.

19. The assembly according to claim 18, wherein the host comprises: a first plurality of cells for storing data; and a first programmable matrix connected to inputs and outputs of the first plurality of cells wherein at least one cell in the first plurality of cells comprises: a second plurality of cells for storing data; and a second programmable matrix connected to inputs and outputs of the second plurality of cells, whereby each cell of the first plurality of cells and of the second plurality of cells has an architecture similar to the whole universal hardware device, and wherein at least one cell in the second plurality of cells can be directly accessed via a port of the at least one cell in the first plurality of cells.

20. The assembly according to claim 18, including at least two devices wherein the host is adapted to manage at least one task by activating as many of the devices as required for carrying out the at least one task.

21. A module including at least one hardware device comprising a first plurality of cells for storing data; and a first programmable matrix connected to inputs and outputs of the first plurality of cells, wherein at least one cell in the first plurality comprises: a second plurality of cells for storing data; and a second programmable matrix connected to inputs and outputs of the second plurality of cells, whereby each cell of the first plurality of cells and of the second plurality of cells has an architecture similar to the whole hardware device, and wherein at least one cell of the second plurality of cells can be directly accessed via a port of the at least one cell in the first plurality wherein at least one of the cells in the first plurality of cells and in the second plurality of cells and constituent elements thereof are connected by a pre-selected topology and are loaded with data to implement a desired hardware application and at least one I/O interface and wherein a plurality of hardware applications can be implemented by selectively storing data in the first plurality of cells and the second plurality of cells and by selectively programming the first programmable matrix to connect cells in the first plurality of cells and the second programmable matrix to connect cells in the second plurality of cells.

22. A method for designing a hardware application for a universal hardware device comprising: a first plurality of cells for storing data; and a first programmable matrix connected to inputs and outputs of the first plurality of cells, wherein at least one cell in the first plurality of cells comprises: a second plurality of cells for storing data; and a second programmable matrix connected to inputs and outputs of the second plurality of cells, whereby each cell of the first plurality of cells and each of the second plurality of cells has an architecture similar to the whole universal hardware device, and in that at least one cell in the second plurality of cells can be directly accessed via a port of the at least one cell in the first plurality of cells and wherein a plurality of hardware applications can be implemented by selectively storing data in the first plurality of cells and the second plurality of cells cells and by selectively programming the first programmable matrix to connect in the first plurality of cells and the second programmable matrix to connect cells in the second plurality of cells, the method comprising: (a) obtaining construction data representing a topology defining a connectivity between at least one of cells and constituent elements thereof and including data for storage therein so as to define a functionality of one or more of the plurality of hardware applications and (b) deriving formatted data from the construction data being formatted for downloading to the device.

23. The method according to claim 22, wherein step (a) includes: (i) selecting topology, and (ii) formulating data for storing in at least one of the cells in the first plurality of cells and in the second plurality of cells and constituent elements defined by the topology.

24. The method according to claim 22, wherein step (a) includes: (i) accessing a library of pre-configured files each containing construction data relating to a respective functionality, and (ii) selecting one or more of the files for realizing one or more functionalities of the application.

25. The method according to claim 22, wherein step (a) includes: (i) running a computer simulation of one or more of the plurality of hardware applications by programming a computer to implement at least one of the at least one cell and the at least one constituent element and to implement a preconfigured topology using the construction data, and (ii) changing the construction data and repeating step (i) as necessary until the computer simulation is satisfactory.

26. A method for obtaining construction data for a universal hardware device comprising: a first plurality of cells for storing data; and a first programmable matrix connected to inputs and outputs of the first plurality of cells, wherein at least one cell in the first plurality of cells comprises: a second plurality of cells for storing data; and a second programmable matrix connected to inputs and outputs of the second plurality of cells, whereby each cell of the first plurality of cells and of the second plurality of cells has an architecture similar to the whole universal hardware device, and in that at least one cell of the second plurality of cells can be directly accessed via a port of the at least one cell in the first plurality and wherein a plurality of hardware applications can be implemented by selectively storing data in the first plurality of cells and the second plurality of cells and by selectively programming the first programmable matrix to connect cells in the first plurality of cells and the second programmable matrix to connect cells in the second plurality of cells and representing a topology defining a connectivity between at least one of cells and constituent elements thereof and including data for storage therein so as to define a functionality of a hardware application, the method comprising: (a) selecting the topology, and (b) formulating data for storing in the at least one of cells and constituent elements defined by the topology.

27. The method according to claim 26, further including: (c) adding the constructed data to a library of pre-configured files each containing construction data relating to a respective functionality.

28. The method of claim 26, further including: (a) running a computer simulation of one or more of the plurality of hardware applications by programming a computer to implement at least one of the at least one cell and the at least one constituent element and to implement a pre-configured topology using the construction data, and (b) changing the construction data and repeating step (a) as necessary until the computer simulation is satisfactory.

29. A method for obtaining construction data for a universal hardware device comprising: a first plurality of cells for storing data; and a first programmable matrix connected to inputs and outputs of the first plurality of cells, wherein at least one cell in the first plurality of cells comprises: a second plurality of cells for storing data; and a second programmable matrix connected to inputs and outputs of the second plurality of cells, whereby each cell of the first plurality of cells and of the second plurality of cells has an architecture similar to the whole universal hardware device, and in that at least one cell of the second plurality of cells can be directly accessed via a port of the at least one cell in the first plurality and wherein a plurality of hardware applications can be implemented by selectively storing data in the first plurality of cells and the second plurality of cells and by selectively programming the first programmable matrix to connect cells in the first plurality of cells and the second programmable matrix to connect cells in the second plurality of cells and representing a topology defining a connectivity between at least one of the cells in the first plurality of cells and in the second plurality of cells and at least one of constituent elements thereof and including data for storage therein so as to define a functionality of a hardware application, the method comprising: (a) accessing a library of pre-configured files each containing construction data relating to a respective functionality, and (b) selecting one or more of the files for realizing at least partial functionality of the application.

30. The method according to claim 29, further including: (c) adding the constructed data to a library of pre-configured files each containing construction data relating to a respective functionality.

31. The method according to claim 30, wherein the library is stored remotely and the step of accessing the library is effected via a communication channel.

32. The method according to claim 30, wherein steps (b) and (c) include programming the computer using a high-level programming language.

33. The method of claim 30, wherein the library is stored remotely and the step of accessing the library is effected via a communication channel.

34. The method according to claim 29, further including: (d) running a computer simulation of one or more of the plurality of hardware applications by programming a computer to implement at least one of the at least one cell and the at least one constituent element and to implement a pre-configured topology using the construction data, and (e) changing the construction data and repeating step (d) as necessary until the computer simulation is satisfactory.

35. The method according to claim 34, wherein instantaneous samples of part of the simulation construction data are continuously calculated on the fly by the computer.

36. The method according to claim 34, wherein at least part of the simulation construction data is pre-configured and stored and instantaneous samples thereof are fetched as required by the computer.

37. The method of claim 34, wherein instantaneous samples of part of the simulation construction data are continuously calculated on the fly by the computer.

38. The method according to claim 29, wherein step (b) includes programming the computer using a high-level programming language.

39. The method according to claim 29, wherein at least part of the simulation construction data is pre-configured and recalled from storage.

40. The method of claim 29, wherein the library is stored remotely and the step of accessing the library is effected via a communication channel.

41. A method for implementing a hardware application, the method comprising: (a) using a universal hardware device comprising: a first plurality of cells for storing data; and a first programmable matrix connected to inputs and outputs of the first plurality of cells, wherein at least one cell in the first plurality of cells comprises: a second plurality of cells for storing data; and a second programmable matrix connected to inputs and outputs of the second plurality of cells, whereby each cell of the first plurality of cells and of the second plurality of cells has an architecture similar to the whole universal hardware device, and in that at least one cell of the second plurality of cells can be directly accessed via a port of the at least one cell in the first plurality and wherein a plurality of hardware applications can be implemented by selectively storing data in the first plurality of cells and the second plurality of cells and by selectively programming the first programmable matrix to connect cells in the first plurality of cells and the second programmable matrix to connect cells in the second plurality of cells, (b) obtaining formatted data that is derived from predetermined simulation construction data and is formatted for downloading to the device, and (c) downloading the formatted data to the device.

42. The method according to claim 41, wherein prior to performing step (b) there are included the steps of: (i) designing one or more of the plurality of hardware applications by obtaining construction data representing a topology defining a connectivity between at least one of the cells in the first plurality of cells and in the second plurality of cells and constituent elements and including data for storage therein so as to define a functionality of one or more of the plurality of hardware applications and deriving formatted data from the construction data being formatted for downloading to the device so as to produce the simulation construction data, and (ii) formatting the simulation construction data so as to produce the formatted data.

43. A method for simulating a hardware application implemented by a universal hardware device comprising: a first plurality of cells for storing data; and a first programmable matrix connected to inputs and outputs of the first plurality of cells, wherein at least one cell in the first plurality of cells comprises: a second plurality of cells for storing data; and a second programmable matrix connected to inputs and outputs of the second plurality of cells, whereby each cell of the first plurality of cells and of the second plurality of cells has an architecture similar to the whole universal hardware device, and in that at least one cell of the second plurality of cells can be directly accessed via a port of the at least one cell in the first plurality and wherein a plurality of hardware applications can be implemented by selectively storing data in the first plurality of cells and the second plurality of cells and by selectively programming the first programmable matrix to connect cells in the first plurality of cells and the second programmable matrix to connect cells in the second plurality of cells and wherein at least one of the cells in the first plurality of cells and in the second plurality of cells and constituent elements are connected by a pre-selected topology and are loaded with data to implement a desired hardware application and being adapted for stepwise running, the method comprising: (a) downloading the data into at least one device in an emulation module including at least one hardware device comprising a first plurality of cells for storing data; and a first programmable matrix connected to inputs and outputs of the first plurality of cells, wherein at least one cell in the first plurality of cells comprises: a second plurality of cells for storing data; and a second programmable matrix connected to inputs and outputs of the second plurality of cells, whereby each cell of the first plurality of cells and of the second plurality of cells has an architecture similar to the whole universal hardware device, and in that at least one cell of the second plurality of cells can be directly accessed via a port of the at least one cell in the first plurality and wherein a plurality of hardware applications can be implemented by selectively storing data in the first plurality of cells and the second plurality of cells and by selectively programming the first programmable matrix to connect cells in the first plurality of cells and the second programmable matrix to connect cells in the second plurality of cells and wherein at least one of the cells in the first plurality of cells and in the second plurality of cells and the constituent elements are connected by a pre-selected topology and are loaded with data to implement a desired hardware application and at least one I/O interface, (b) routing an input sample generated by a control unit via the at least one I/O interface to the emulation module, (c) collecting an output sample for analysis from the emulation module via the at least one I/O interface, (d) receiving an authorization signal for authorizing input of a subsequent input sample to the emulation module, and (e) repeating steps (b) to (d), as required.

44. The method according to claim 43, wherein the control unit is a computer.

45. The method according to claim 44, wherein the computer receives the output signal for performing analysis thereof.

46. A method for stepwise running an application with a module including at least one hardware device comprising a first plurality of cells for storing data; and a first programmable matrix connected to inputs and outputs of the first plurality of cells, wherein at least one cell in the first plurality of cells comprises: a second plurality of cells for storing data; and a second programmable matrix connected to inputs and outputs of the second plurality of cells, whereby each cell of the first plurality of cells and of the second plurality of cells has an architecture similar to the whole hardware device, and in that at least one cell of the second plurality of cells can be directly accessed via a port of the at least one cell in the first plurality and wherein a plurality of hardware applications can be implemented by selectively storing data in the first plurality of cells and the second plurality of cells and by selectively programming the first programmable matrix to connect cells in the first plurality of cells and the second programmable matrix to connect cells in the second plurality of cells and wherein at least one of the cells in the first plurality of cells and in the second plurality of cells and constituent elements are connected by a pre-selected topology and are loaded with data to implement a desired hardware application and at least one I/O interface, the method comprising: (a) externally feeding a clock enable signal so as to feed the at least one clock signal to the application, and (b) internally disabling the clock enable signal so as to prevent feeding of the at least one clock signal pending override by the application.

47. A method for testing a device included within a module including at least one hardware device comprising a first plurality of cells for storing data; and a first programmable matrix connected to inputs and outputs of the first plurality of cells, wherein at least one cell in the first plurality of cells comprises: a second plurality of cells for storing data; and a second programmable matrix connected to inputs and outputs of the second plurality of cells, whereby each cell of the first plurality of cells and of the second plurality of cells has an architecture similar to the whole hardware device, and in that at least one cell of the second plurality of cells can be directly accessed via a port of the at least one cell in the first plurality and wherein a plurality of hardware applications can be implemented by selectively storing data in the first plurality of cells and the second plurality of cells and by selectively programming the first programmable matrix to connect cells in the first plurality of cells and the second programmable matrix to connect cells in the second plurality of cells and wherein at least one of the cells in the first plurality of cells and in the second plurality of cells and constituent elements are connected by a pre-selected topology and are loaded with data to implement a desired hardware application and at least one I/O interface, the method comprising the steps of: (a) connecting at least one of the interfaces of the module under test to control unit, (b) configuring the device under test to port out samples at specific points of the device via at least one of the interfaces, (c) storing the samples in the control unit so as to obtain in real time a history of samples of the device, (d) using the control unit to analyze at least a subset of most recent samples, (e) if necessary arresting operation of the device under test so as to allow: (i) examination of the history of samples of the device, (ii) examination of the instantaneous current state of the arrested device, (iii) download of a different state to the arrested device, (iv) continuation of the real time operation of the device, and (v) restarting real time operation of the device.

48. The method according to claim 47, wherein the control unit is a computer having coupled thereto via the at least one I/O interface.

49. A method for designing an application-specific hardware device comprising: a first plurality of cells for storing data; and a first programmable matrix connected to inputs and outputs of the first plurality of cells, wherein at least one cell in the first plurality of cells comprises: a second plurality of cells for storing data; and a second programmable matrix connected to inputs and outputs of the second plurality of cells, whereby each cell of the first plurality of cells and of the second plurality of cells has an architecture similar to the whole hardware device, and in that at least one cell of the second plurality of cells can be directly accessed via a port of the at least one cell in the first plurality and wherein a plurality of hardware applications can be implemented by selectively storing data in the first plurality of cells and the second plurality of cells and by selectively programming the first programmable matrix to connect cells in the first plurality of cells and the second programmable matrix to connect cells in the second plurality of cells and having a desired functionality, the method comprising: (a) obtaining formatted data which in conjunction with the device defines a device architecture that realizes the desired functionality, (b) replacing at least one of the first programmable matrix and the second programmable matrix of the device architecture by fixed connections that realize the connectivity of at least one of the first programmable matrix and the second programmable matrix, and (c) replacing storage elements in at least some of the cells in the first plurality of cells and in the second plurality of cells in the device architecture by respective fixed drive levels for realizing the data stored therein; whereby the architecture is rendered suitable for direct implementation of the application-specific hardware device.

50. A method for designing a hardware device comprising: a first plurality of cells for storing data; and a first programmable matrix connected to inputs and outputs of the first plurality of cells, wherein at least one cell in the first plurality of cells comprises: a second plurality of cells for storing data; and a second programmable matrix connected to inputs and outputs of the second plurality of cells, whereby each cell of the first plurality of cells and of the second plurality of cells has an architecture similar to the whole hardware device, and in that at least one cell of the second plurality of cells can be directly accessed via a port of the at least one cell in the first plurality and wherein a plurality of hardware applications can be implemented by selectively storing data in the first plurality of cells and the second plurality of cells and by selectively programming the first programmable matrix to connect cells in the first plurality of cells and the second programmable matrix to connect cells in the second plurality of cells and having a desired functionality part of which is fixedly implemented by an application-specific hardware device and part of which is re-programmable, the method comprising: (a) obtaining formatted data which in conjunction with the device defines a device architecture that realizes the desired functionality, (b) replacing a part of at least one of the first programmable matrix and the second programmable matrix of the device architecture by fixed connections that realize the connectivity of part of at least one of the first programmable matrix and the second programmable matrix, and replacing some storage elements in at least some of the cells in the first plurality of cells and in the second plurality of cells in the device architecture by respective fixed drive levels for realizing the data stored therein.

51. The method according to claim 50, wherein prior to downloading the formatted data to a device comprising: a first plurality of cells for storing data; and a first programmable matrix connected to inputs and outputs of the first plurality of cells, wherein at least one cell in the first plurality of cells comprises: a second plurality of cells for storing data; and a second programmable matrix connected to inputs and outputs of the second plurality of cells, whereby each cell of the first plurality of cells and of the second plurality of cells has an architecture similar to the whole universal hardware device, and in that at least one cell of the second plurality of cells can be directly accessed via a port of the at least one cell in the first plurality and wherein a plurality of hardware applications can be implemented by selectively storing data in the first plurality of cells and the second plurality of cells and by selectively programming the first programmable matrix to connect cells in the first plurality of cells and the second programmable matrix to connect cells in the second plurality of cells and wherein at least one of the cells in the first plurality of cells and in the second plurality of cells and the constituent elements are connected by a pre-selected topology and are loaded with data to implement a desired hardware application there are further included the steps of: (i) using the formatted data to create device-specific extended formatted data which includes the formatted data and a fault list of any faulty cells or connections, (ii) testing cells and connections in the device and updating the fault list accordingly, and (iii) changing the formatted data within the device-specific extended formatted data, if necessary so as to avoid using any tested cells or connection that were found faulty.

52. A method for real time automatic fault detection and correction of a device comprising: a first plurality of cells for storing data; and a first programmable matrix connected to inputs and outputs of the first plurality of cells, wherein at least one cell in the first plurality of cells comprises: a second plurality of cells for storing data; and a second programmable matrix connected to inputs and outputs of the second plurality of cells, whereby each cell of the first plurality of cells and of the second plurality of cells has an architecture similar to the whole device, and in that at least one cell of the second plurality of cells can be directly accessed via a port of the at least one cell in the first plurality and wherein a plurality of hardware applications can be implemented by selectively storing data in the first plurality of cells and the second plurality of cells and by selectively programming the first programmable matrix to connect cells in the first plurality of cells and the second programmable matrix to connect cells in the second plurality of cells and wherein at least one of the cells in the first plurality of cells and in the second plurality of cells and constituent elements are connected by a pre-selected topology and are loaded with data to implement a desired hardware application, the method comprising: (a) using device-specific extended formatted data which includes the formatted data and includes a fault list of any faulty cells or connections in order to locate an unused cell that is not in the fault list, and if such an unused cell is located: (b) testing the unused cell and its connections, (c) if at least one of the unused cell and any of its connections are faulty, updating the device-specific extended formatted data and repeating from step (a) until an unused cell that is not faulty is located, (d) using the device-specific extended formatted data to select a used cell for testing, (e) using the device-specific extended formatted data to duplicate the selected cell and its connections on to the unused cell, (f) disconnecting the selected cell and updating the device extended formatted data accordingly, and (g) repeating from step (a) as required.

53. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for designing a hardware application to be implemented using a device or a module including the device and at least one I/O interface, wherein: the device comprises: a first plurality of cells for storing data; and a first programmable matrix connected to inputs and outputs of the first plurality of cells; wherein at least one cell in the first plurality of cells comprises a second plurality of cells for storing data; and a second programmable matrix connected to inputs and outputs of the second plurality of cells, whereby each cell of the first plurality of cells and of the second plurality of cells has an architecture similar to the whole device, and wherein at least one cell of the second plurality of cells can be directly accessed via a port of the at least one cell in the first plurality of cells and wherein a plurality of hardware applications can be implemented by selectively storing data in the first plurality of cells and the second plurality of cells and by selectively programming the first programmable matrix to connect cells in the first plurality of cells and the second programmable matrix to connect cells in the second plurality of cells; and the method comprises: (a) obtaining construction data representing a topology defining a connectivity between at least one of the cells in the first plurality of cells and in the second plurality of cells and constituent elements and including data for storage therein so as to define a functionality of the one or more of the plurality of hardware applications and (b) deriving formatted data from the construction data being formatted for downloading to the device.

54. A computer program product comprising a computer useable medium having computer readable program code embodied therein for designing a hardware application to be implemented using a universal hardware device comprising: a first plurality of cells for storing data; and a first programmable matrix connected to inputs and outputs of the first plurality of cells, wherein at least one cell in the first plurality of cells comprises: a second plurality of cells for storing data; and a second programmable matrix connected to inputs and outputs of the second plurality of cells, whereby each cell of the first plurality of cells and of the second plurality of cells has an architecture similar to the whole universal hardware device, and in that at least one cell of the second plurality of cells can be directly accessed via a port of the at least one cell in the first plurality and wherein a plurality of hardware applications can be implemented by selectively storing data in the first and second pluralities of cells and by selectively programming the first programmable matrix to connect cells in the first plurality of cells and the second programmable matrix to connect cells in the second plurality of cells and at least one I/O interface, the computer program product comprising: computer readable program code for causing the computer to obtain construction data representing a topology defining a connectivity between at least one of the cells and constituent elements and including data for storage therein so as to define a functionality of the hardware application, and computer readable program code for causing the computer to derive formatted data from the construction data being formatted for downloading to the device.

Description

FIELD OF THE INVENTION

This invention relates to circuit design and testing and device architecture.

BACKGROUND OF THE INVENTION

Known design and manufacturing processes of integrated circuits (ICs) and modules containing ICs including implementation using DSPs or Application Specific Integrated Circuits (ASIC) require lengthy development cycles and are expensive. In particular, the time required to market integrated circuits is long owing to the length of the development period requiring protracted design, verification and testing of the application. During verification and testing of the product, failures in the design may be detected requiring debugging and repair, this greatly adds to the development cost and time, particularly when design failures are detected at the end of the process, for example after product delivery.

Hardware products are expensive since the time to market influences engineering costs, market loss, and so on. Costs must also bear the overhead of testing equipment used during development, the expense of testing equipment used during manufacturing, inventory size and the costs relating to employment of professional engineers.

Simulation accuracy is poor. One simulation is required to simulate the desired functionality of the application, while another is required to simulate the run-time operation of the implementation. If simulation of high-resolution delays is required, simulation becomes cumbersome. Moreover, simulations are very slow, often hundreds of thousands times slower than real-time.

Yet another drawback of known design and development processes is that the size of prototypes and non-ASIC electronic cards may exceed practical dimensions. The ability to test high-end System on a Chip products is limited owing to the very high density of integrated circuits, making it very difficult to probe points of interest in the circuit. It is likewise difficult to test finished products containing high-density integrated circuits, and to verify the complete functionality of complex circuits.

The adaptation of already designed products to advanced chip production technologies so as to ensure the compatibility of applications to newer technologies, is poor. This impacts not only on the application developer but also makes it difficult for the chip manufacturer to employ new geometries whilst enabling developers to use their existing applications.

ASICs are used to implement an application using a smaller area, where the application will be sold in sufficient quantity to justify customization. Typically, the application is first developed using conventional design methods and, after establishing the integrity of the design, it is converted to an ASIC. This is a time-consuming and expensive process.

Typically, application speed is enhanced by implementing the application in hardware at then expense of functional flexibility since most hardware implementations are dedicated to a specific application and are not amenable to extension or changes.

Yet another drawback associated with the industry is the relative scarcity of qualified personnel and the difficulty in breaking down the design and development so as to be amenable to sharing amongst several engineers in order that the development cycle may be reduced.

All these limitations combine to increase both the length and the cost of the development and manufacturing cycle. In order to demonstrate the complexity and effort associated with conventional design and manufacturing processes for implementing logic integrated circuits or modules including them, various implementation methods will now be described.

FIG. 1 is a flow diagram showing the conventional process for hardware implementation of Integrated Circuit (IC) chips on electronic cards. During the card development process 7a card is developed for performing a needed application. After this process, the card is ready for the production activities. The idea and concept relating to the application are defined during the application definition 10, after which there follows the interface definition 12. Sometimes the interface is determined by the system environment; for example in a PC application card. Sometimes the firm that made the card can choose the interface; for example in a rack full of cards, most of the cards will have the same interface. Sometimes the interface is unique and has to be designed in the same process as the rest of the card.

Technology selection 16 is influenced by real-time needs, flexibility, size and the number of tasks. Hardware implementation is selected for an application owing to its superior real-time performance. Hardware implementation is the fastest (real-time) solution, but it can do only a limited number of tasks, is not flexible and is large. Time to market is lengthy. The process from development to manufacturing may take months to more than a year, nine months being considered a very good result. The development and manufacturing processes are expensive.

The space available for the application is an important parameter for the technology selection process. Normally, the smaller the better. If size is critical, the card can be converted into an ASIC in which case the development period is extended by several months owing to the need for ASIC conversion. Nine months is a normal extension time. In practice, ASIC conversion suffers from all the problems mentioned above.

In most cases, particularly for complex high speed, digital hardware design, simulation 18 of the application is required. Sometimes the simulation is considered part of the application definition process, and is not counted as part of the hardware development process time. The simulation process period may last from days to a few months, depending on the complexity of the application. The purpose of the simulation is to verify that the idea and the implementation considered are feasible. For example: if there is an idea to compress voice over communication links, the compression algorithm will be made using a high level language such as "C++" and other software tools such as "Matlab" (of Mathworks, Natick, Mass.). There is no simple association between the simulation and the implementation. Sometimes, more than one simulation is made and workstations may be needed to increase the simulation speed.

The interface must then be designed 20. If the interface is standard, for example a PCI interface in a PC card, in most cases there is no need to redesign it and an "off the shelf" ready-made chip set may be used in the implementation. If the interface is proprietary, the hardware developer might prefer to separate the development of the interface from the development of the application so as to be able to use the interface for other applications as well, such as the interface for a few cards in the same rack. In such a case, the time to market would not be influenced by the time needed to develop and implement the interface. Only when the interface is unique, does it become part of overall implementation development.

Electronic design 22 is the procedure that converts the solution into a set of electronic Integrated Circuit devices. The developer keeps in mind a library of such devices, which can perform certain functions and chooses the needed ICs to be connected together in order to implement the application. The electronic design process is complicated, as the engineer has to remember a lot of different components and the way to use them. As the technology improves, new components with complicated functions are added. Sometimes the electronics engineer is left behind. High-Level Design languages have been implemented, but still most of the designing has to be done in modular fashion. The more complicated the application, the more time is needed for the design.

Drawings are the interface between the application engineer and the computerized tools used to manufacture the card. This is the "language" the engineer uses to "write" his implementation ideas. The drawing process 24 converts the design into schematic drawings. The more complicated the application, the more time is needed for the drawings. If a programmable device is used, High-level Design Language (HDL) may be used to replace part of drawings.

When implementation is not trivial, simulation 25 of the design is done and timing is checked. The designer tries to correlate between the result of the application simulation 18 and the current simulation. If any mistake is found, the design is modified, requiring steps 22 and 24 to be repeated.

After the design has been finished, the components 26 are obtained. It sometimes takes quite a long time to purchase a specific device, thereby delaying prototype production. The more complex the application, the more components are used, the longer is the time to production and the bigger is the inventory of components.

Layout 28 is the process that converts the drawn design into a manufactured package. The more complex the application, the more components are used and the longer the layout time.

The board is manufactured 30. For each application a different board must be manufactured thus increasing the amount of human resources invested and the need for expensive equipment for card verification and manufacture.

The components are installed on to the manufactured card. Faults in the layout can cause problems in the installation. For example, the tools for installing the prototypes are normally different from those used for manufacturing. Specifically, installation in the development phase uses less automation and the probability of a faulty card is greater. To shorten the installation period, special purpose, expensive equipment is used, for example high pitch chip insertion equipment.

Test and debug 34 are the longest periods in the overall development process. Fifty to seventy-five percent of the time spent on a complex design goes into verification. Verification is quickly becoming the biggest technology barrier.

Errors can be made in each one of the above tasks. For example, narrow spacing between tracks of a printed circuit can cause a short circuit. It is worse if this kind of mistake is not discovered in the debugging process, because it may be found later when it is more expensive to repair. Errors can be made also in card definition, and so on. That means repetitive processes occur. It is normal to have three versions of the prototype before the first batch of production. Expensive test equipment is needed to test the electronic card, for example: signal generators, noise generators logic analyzers, scopes, dB meters, adders, line simulators and others.

In the R&D to production process 36, the documentation with all the details needed to manufacture the card is created. Although this process can start before the last version of the prototype is ready, the process extends the time to market. Automatic tools for card verification (such as bed of nails), and function verification are created. Automatic component insertion machinery is programmed, and so on.

The above process causes the development and manufacturing cycle to be long. The later an error is found the harder and the more expensive is the repair. Therefore, if an ASIC is needed, considerable effort is made to assure error-free results. It is normal to manufacture a few batches before converting the electronics into an ASIC.

Once the card documentation is ready, all the components have been procured, chip insertion machines have been programmed, and so on, the card production process 37 can commence. There then follows production card verification 38 wherein the card is tested with or without its electronics. Function verification 40 is the process of testing the card for the designed application. It is quite complicated and time consuming to create automatic equipment for testing each function of the application. When the result is satisfactory, the card delivery 42 to the client may be performed.

If an error is found or an enhancement is needed at the end of the process, repairs and enhancements 44 are very difficult, expensive and time consuming. In the worst case, most of the process has to be repeated.

As mentioned above, real-time needs, flexibility, size and number of tasks influence the choice of technology. A circuit may be implemented as a Digital Signal Processor when flexibility is needed and/or a large number of tasks are to be performed, but not at the same time. This solution is slower in real-time than an equivalent hardware implementation. The DSP implementation is about the same size as the hardware implementation, but generally DSP implementations do not allow the option of conversion into ASIC.

Although the above-described hardware development process must also be implemented for the DSP card, it rarely influences the time to market period for several reasons. First, the hardware implementation is simple, as the DSP vendors propose solutions for the hardware design. Only a non-standard interface needs to be designed. Second, once a card is ready and the interface is fixed, a ready-made card can be used for the new application. Normally the process of developing the programming code takes more time than the process of developing the DSP card. Nevertheless, the card has to be developed at least once. Different problems than those relating to hardware implementation must also be addressed, such as which kind of DSP to choose, whether the DSP is going to satisfy the needs of next-generation applications and so on.

As the industry adheres to Gordon Moore's Law, doubling the number of transistors in a die every 18 months by shrinking the feature size (transistors and interconnects), products are becoming obsolete with each new semiconductor generation. Therefore new DSP/CPU development is needed frequently. Very commonly vendors do their best to enable software compatibility, but in practice conversion time is needed. This conversion process influences the cost as well as the time to market.

DSP implementation has advantages over hardware implementation in the simulation stage as both the simulation and the implementation can be written in a high level language such as "C". In practice, there is no efficient means of conversion from the simulation to the DSP code. In other words, the simulation does not simulate the exact implementation, especially if assembly language is used in the DSP coding. Also, the cost of R&D is high. The cost calculation has to consider the development of the card and the development of the software. The market is short of DSP experts so the wages expense is high.

In production, automatic function verification is still hard to implement, but if an error or enhancement is discovered after delivery, it still can be fixed by changing the software at the customer's premises, although in practice this is far from trivial. The easy part is loading the revised code into the implementation.

Sometimes a combined implementation is preferred. For example, if a filter is needed in a DSP implementation, the filter may be implemented in hardware and the rest of the application will be implemented in the DSP. The advantages and disadvantages of each part remain.

Programmable Logic Devices (PLDs) allow for flexible implementation but are limited in the application capability for a given chip area. A simulation language has been converted into a High-level Design Language (HDL/VHDL) to enable the designer create implementations. Nevertheless, these software languages enable the user to create the hardware in modular form: so they are far behind languages like C++. Simulation is not accurate, debugging is complicated and the product is expensive. When size is critical, it is most common to implement the application using a few high-end PLDs as a "fast prototype" and then to convert the application into ASIC. In this case, it is common to have a few iterations for the ASIC development, which increases price and time to market.

The ASIC development process is described, for example, by Texas Instruments Incorporated, (Dallas, Tex. USA, 75380-9066) whose WEB address: is http://www.ti.com/sc/docs/asic/cad/cad.htm.

Reference is also made to http://www.verisity.com/html/spechased.html belonging to Verisity Design, Inc. Mountain View, Calif. USA. Likewise, further information relating to Electronic Design Automation may be found by reference to http://www.wsdmag.com/library/penton/archives/wsd/January1998/261.htm which acknowledges that electronic-design-automation (EDA) technology has lagged behind the rate of progress of semiconductor fabrication.

Some of the drawbacks associated with the design, manufacturing and verification process have been addressed in the patent literature. U.S. Pat. No. 5,815,726 (Cliff; Richard G.) entitled "Coarse-grained look-up table architecture" published Sep. 29, 1998 and assigned to Altera Corporation discloses a programmable logic device architecture. For interconnecting signals to and from the logic array blocks, the global interconnection resources include switch boxes, long lines, double lines, single lines, and half- and partially populated multiplexer regions. The logic array block includes two levels of function blocks. In a first level, there are eight four-input function blocks. In a second level, there are two four-input function blocks and four secondary two-input function blocks. In one embodiment, these function blocks are implemented using look-up tables (LUTs). The logic array block has combinatorial and registered outputs and also contains storage blocks for implementing sequential or registered logic functions. The logic array block has a carry chain for implementing logic functions requiring carry bits and may also be configured to implement a random access memory.

U.S. Pat. No. 5,909,450 (Wright; Adam) entitled "Tool to reconfigure pin connections between a DUT and a tester" published Jun. 1, 1999 and assigned to Altera Corporation discloses a method of simulating the testing of integrated circuits is provided. A database of desired connections between a tester unit and a device under test (DUT) for different downbonds is accessed by a multiplexer which sets up the desired connections. The system automatically makes the correct connection for each downbond without manual intervention from the user as was required in traditional simulator systems.

U.S. Pat. No. 5,821,773 (Norman; Kevin A. et al.) entitled "Look-up table based logic element with complete permutability of the inputs to the secondary signals" published Oct. 13, 1998 and assigned to Altera Corporation discloses a logic element for a programmable logic device. The logic element includes a look-up table for implementing logical functions, a programmable delay block, a storage block configurable as a latch or a flip-flop, and a diagnostic shadow latch. A plurality of inputs to the logic element and complements of these inputs are available to control the secondary functions of the storage block.

U.S. Pat. No. 6,018,490 (Cliff; Richard G. et al.) entitled "Programmable logic array integrated circuits" published Jan. 25, 2000 and assigned to Altera Corporation discloses programmable logic array integrated circuit having a number of programmable logic modules which are grouped together in a plurality of logic array blocks. The logic array blocks are arranged on the circuit in a two dimensional array. A conductor network is provided for interconnecting any logic module with any other logic module. In addition, adjacent or nearby logic modules are connectable to one another for such special purposes as providing a carry chain between logic modules and/or for connecting two or more modules together to provide more complex logic functions without having to make use of the general interconnection network. Another network of so-called fast or universal conductors is provided for distributing widely used logic signals such as clock and clear signals throughout the circuit. Multiplexers can be used in various ways to reduce the number of programmable interconnections required between signal conductors.

U.S. Pat. No. 6,058,492 (Sample; Stephen P. et al.) entitled "Method and apparatus for design verification using emulation and simulation" published May 2, 2000 and assigned to Quickturn Design Systems, Inc. discloses a method and apparatus for combining emulation and simulation of a logic design. The method and apparatus can be used with a logic design that includes gate-level descriptions, behavioral representations, structural representations, or a combination thereof. The emulation and simulation portions are combined in a manner that minimizes the time for transferring data between the two portions. Simulation is performed by one or more microprocessors while emulation is performed in reconfigurable hardware such as field programmable gate arrays. When multiple microprocessors are employed, independent portions of the logic design are selected to be executed on the multiple synchronized microprocessors. Reconfigurable hardware also performs event detecting and scheduling operations to aid the simulation, and to reduce processing time.

U.S. Pat. No. 5,815,715 (Ku.cedilla.uk et al.) entitled "Method for designing a product having hardware and software components and product therefor" published Sep. 29, 1998 and assigned to Motorola, Inc. discloses a computing system and a method for designing the computing system using hardware and software components. The computing system includes programmable coprocessors having the same architectural style. Each coprocessor includes a sequencer and a programmable interconnect network and a varying number of functional units and storage elements. The computing system is designed by using a compiler to generate a host microprocessor code from a portion of an application software code and a coprocessor code from the portion of the application software code. The compiler uses the host microprocessor code to determine the execution speed of the host microprocessor and the coprocessor code to determine the execution speed of the coprocessor and selects one of the host microprocessor or the coprocessor for execution of the portion of the application software code. Then the compiler creates a code that serves as the software program.

U.S. Pat. No. 6,058,452 (Rangasayee; Krishna) entitled "Memory cells configurable as CAM or RAM in programmable logic devices" published May 2, 2000 and assigned to Altera Corporation discloses a programmable logic device having content addressable memory. The programmable logic device may include reconfigurable dual mode memory suitable for operating as a content addressable memory in a first mode and a random access memory in a second mode. Mode control switch circuitry may be provided to selectively enable a user to configure the dual mode memory as either content addressable memory or random access memory.

U.S. Pat. No. 6,078,736 (Guccione; Steven A.) entitled "Method of designing FPGAs for dynamically reconfigurable computing" published Jun. 20, 2000 and assigned to Xilinx, Inc. discloses a method of designing FPGAs for reconfigurable computing comprising a software environment for reconfigurable coprocessor applications. This environment comprises a standard high-level language compiler (i.e. Java) and a set of libraries. The FPGA is configured directly from a host processor, configuration, reconfiguration and host run-time operation being supported in a single piece of code. Design compile times on the order of seconds and built-in support for parameterized cells are significant features of the inventive method.

U.S. Pat. Nos. 6,031,391 and 6,097,211 (Couts-Martin; Chris et al.) both entitled "Configuration memory integrated circuit" published Feb. 29, 2000 and Aug. 1, 2000 respectively and assigned to Altera Corporation disclose a configuration memory for storing information that is in-system programmable. The programming of the configuration memory may be performed using JTAG (IEEE Standard 1149.1) instructions. Furthermore, the configuration of a programmable logic device using the configuration data in the configuration memory may be initiated with a JTAG instruction. Pull-up resistors are incorporated within the configuration memory package.

U.S. Pat. No. 5,894,228 (Reddy; Srinivas et al.) entitled "Tristate structures for programmable logic devices" published Apr. 13, 1999 and assigned to Altera Corporation discloses a programmable logic device architecture including tristate structures. The programmable logic device architecture provides tristate structures which may be logically or programmably controlled, or both. Through these tristate structures, the logic elements may be coupled to the programmable interconnect, where they may be coupled with other logic elements of the programmable logic device. Using these tristate structures, the signal pathways of the architecture may be dynamically reconfigured.

U.S. Pat. No. 6,026,230 (Lin; Sharon Sheau-Pyng et al.) entitled "Memory simulation system and method" published Feb. 15, 2000 and assigned to Axis Systems, Inc. discloses a system having four modes of operation: (1) Software Simulation, (2) Simulation via Hardware Acceleration, (3) In-Circuit Emulation (ICE), and (4) Post-Simulation Analysis. At a high level, the system may be embodied in each of the above four modes or various combinations of these modes. At the core of these modes is a software kernel that controls the overall operation of this system. The main control loop of the kernel executes the following steps: initialize system, evaluate active test-bench processes/components, evaluate clock components, detect clock edge, update registers and memories, propagate combinational components, advance simulation time, and continue the loop as long as active test-bench processes are present. The Memory Mapping aspect of the invention provides a structure and scheme where the numerous memory blocks associated with the user's design is mapped into the SRAM memory devices in the Simulation system instead of inside the logic devices, which are used to configure and model the user's design. The Memory Mapping or Memory Simulation system includes a memory state machine, an evaluation state machine, and their associated logic to control and interface with: (1) the main computing system and its associated memory system, (2) the SRAM memory devices coupled to the FPGA buses in the Simulation system, and (3) the FPGA logic devices which contain the configured and programmed user design that is being debugged.

U.S. Pat. No. 6,020,759 (Heile; Francis B.) entitled "Programmable logic array device with random access memory configurable as product terms" published Feb. 1, 2000 and assigned to Altera Corporation discloses a look-up-table-based programmable logic device is provided with memory circuitry which can be operated either as random access memory ("RAM") or to perform product term ("p-term") logic. Each individual row of the memory is separately addressable for writing data to the memory or, in RAM mode, for reading data from the memory. Alternatively, multiple rows of the memory are addressable in parallel to read p-terms from the memory. The memory circuitry of the invention is particularly useful as an addition to look-up-table-type programmable logic devices because the p-term capability of the memory circuitry provides an efficient way to perform wide fan-in logic functions which would otherwise require trees of multiple look-up tables.

U.S. Pat. No. 6,028,809 (Schleicher; James.) entitled "Programmable logic device incorporating a tri-stateable logic array block" published Feb. 22, 2000 and assigned to Altera Corporation discloses a programmable logic that incorporates a multi-function block having a plurality of integrally connected function units where at least one of the function units within the multi-function block is a tristate logic unit. The programmable logic device also includes a tristate bus operatively connected to the tristate logic unit that can supply tristate logic signals to the tristate bus as well as receive tristate logic signals from the tristate bus. The tristate bus carries tristate data signals and address select signals that operate to select a desired one of the tristate logic units within the programmable logic device.

U.S. Pat. No. 6,085,317 (Smith; Stephen J.) entitled "Reconfigurable computer architecture using programmable logic devices" published Jul. 4, 2000 and assigned to Altera Corporation discloses a method and system for computing using reconfigurable computer architecture utilizing logic devices. The computing may be accomplished by configuring a first programmable logic unit as a system controller. The system controller directs the implementation of an algorithm in a second one of the programmable logic units concurrently with reconfiguring a third one of the programmable logic units. In another aspect, the computing system may include a pair of independent, bi-directional busses each of which is arranged to electrically interconnect the system controller and the plurality of programmable logic devices. With this arrangement, a first bus may be used to reconfigure a selected one of the programmable logic devices as directed by the system controller while the second bus is used by an operational one of the programmable logic devices.

U.S. Pat. Nos. 6,034,536 and 6,091,258 (McClintock; Cameron et al.) both entitled "Redundancy circuitry for logic circuits" published Mar. 7, 2000 and Jul. 18, 2000 respectively and assigned to Altera Corporation disclose redundant circuitry for a logic circuit such as a programmable logic device. The redundant circuitry allows the logic circuit to be repaired by replacing a defective logic area on the circuit with a redundant logic circuit. Rows and columns of logic areas may be logically remapped by row and column swapping. The logic circuit contains dynamic control circuitry for directing programming data to various logic areas on the circuit in an order defined by redundancy configuration data. Redundancy may be implemented using either fully or partially redundant logic areas. Logic areas may be swapped to re-map a partially redundant logic area on to a logic area containing a defect. The defect may then be repaired using row or column swapping or shifting. A logic circuit containing folded rows of logic areas may be repaired by replacing a defective half-row with a redundant half-row.

U.S. Pat. No. 6,069,489 (Iwanczuk; Roman et al.) entitled "FPGA having fast configuration memory data readback" published May 30, 2000 and assigned to Xilinx, Inc. discloses An FPGA configuration memory is divided into columnar frames each having a unique address. Configuration data is loaded into a configuration register, which transfers configuration data frame by frame in parallel. In a preferred embodiment, an input register, a shadow input register and a multiplexer array permit efficient configuration data transfer using a larger number of input bits than conventional FPGAs. A flexible external interface enables connection with bus sizes varying from a predetermined maximum width down to a selected fraction thereof. Configuration data transfer is made more efficient by using shadow registers to drive such data into memory cells on a frame-by-frame basis with a minimum of delay, and by employing a multiplexer array to exploit a wider configuration data transfer bus. The speed of configuration read-back is made substantially equal to the rate of configuration data input by employing configuration register logic that supports bidirectional data transfer. Using the proposed FPGA configuration memory, a bit stream designed for an old device can be used for a new device having additional configuration memory cells.

U.S. Pat. No. 5,477,475 (Sample; Stephen P.) entitled "Method for emulating a circuit design using an electrically reconfigurable hardware emulation apparatus" published Dec. 19, 1995 and assigned to Quickturn Design Systems, Inc. discloses a system for physical emulation of electronic circuits or systems including a data entry workstation where a user may input data representing the circuit or system configuration. This data is converted to a form suitable for programming an array of programmable gate elements provided with a richly interconnected architecture. Provision is made for externally connecting VLSI devices or other portions of a user's circuit or system. A network of internal probing interconnections is made available by utilization of unused circuit paths in the programmable gate arrays.

SUMMARY OF THE INVENTION

It is an object of the invention to provide an improved device architecture particularly suited for the design of digital circuits that allows high flexibility and reduces the time from design to finished product.

To this end, there is provided in accordance with a broad aspect of the invention a universal hardware device consisting essentially of:

at least one plurality of cells for storing data; and

at least one programmable matrix coupled to said at least one plurality of cells, whereby a plurality of hardware applications may be implemented by selectively storing data in said cells and selectively programming said matrix to connect at least one of said cells to at least one of said cells.

Such device architecture allows cells to be combined so as to form larger cells, which can themselves be combined to form larger cells, this process being repeated as required; and to configure the combined cell as a hardware application by downloading data to the constituent cells. Preferably, the cells are configurable as Look-Up Tables having addressable memory locations, in which the stored data defines a function implemented by the Look-Up Table. The function can itself be programmed using a high level programming language and may be formatted together with code for implementing a desired connectivity of the cells. The formatted data is then downloaded to the cells in the device. Once downloaded, the device carries out the pre-programmed functionality in a manner that is no longer dependent on the high-level program code used to implement the desired function. As a result, operation of the device is independent of the efficiency of the high-level program code. Identical code may be used to simulate the device thus greatly facilitating design and simulation of the device and greatly reducing the time from design to marketing.

The invention also provides tools for designing, simulating and debugging the hardware device. These tools can also assist in converting all or part of the device to an ASIC after establishing that the finished device operates as required, although the value of such conversion diminishes as the life expectancy of the product falls.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to understand the invention and to see how it may be carried out in practice, a preferred embodiment will now be described, by way of non-limiting example only, with reference to the accompanying drawings, in which:

FIG. 1 is a flow diagram showing a conventional process for hardware implementation of IC chips on electronic cards;

FIG. 2 shows schematically the device architecture according to the invention;

FIGS. 3 and 4 shows schematically alternative configurations of cells for use in the device shown in FIG. 2;

FIG. 5 shows schematically a detail of the cell shown in FIG. 4 including auxiliary circuitry;

FIG. 6 shows schematically the connectivity required to create a cell formed from two cells so as to have a larger address bus;

FIG. 7 shows schematically a matrix connecting D cell outputs to A cell inputs;

FIG. 8 shows schematically a saturated matrix that may be used in the circuit of FIG. 8;

FIG. 9 shows schematically a non-saturated matrix that may be used in the circuit of FIG. 7;

FIG. 10 shows schematically a counter using one cell as shown in FIG. 3;

FIG. 11 shows schematically an Up-Down counter using one cell as shown in FIG. 3;

FIG. 12 shows schematically a shift register using three interconnected cells as shown in FIG. 3;

FIG. 13 shows schematically a possible topology of a one-cell shift register used in the shift register shown in FIG. 12;

FIG. 14 shows schematically how two cells of the kind shown in FIG. 5 may be connected in tri-state;

FIG. 15 shows schematically a RAM-server according to a first embodiment using two cells of the type shown in FIG. 5;

FIG. 16 shows a possible timing diagram for the RAM-server shown in FIG. 15;

FIG. 17 shows schematically a RAM-server to a second embodiment using two cells of the type shown in FIG. 5;

FIG. 18 shows a possible timing diagram for the RAM-server shown in FIG. 17;

FIG. 19 shows schematically a shift register operating in time-sharing application mode;

FIG. 20 is a timing diagram showing the timing operation for the shift register shown in FIG. 19;

FIG. 21 shows schematically a RAM-Server combination operating in a time-sharing application environment;

FIG. 22 shows schematically the connectivity required to create a cell formed from two cells so as to have a larger data bus;

FIG. 23 shows schematically the connectivity required during loading of the cells with data;

FIG. 24 shows schematically a device configured to perform an 8-bit command;

FIG. 25 shows a standalone RAM cell without a latch;

FIG. 26 shows schematically a non-optimized adder using the cell shown in FIG. 25;

FIG. 27 shows schematically an improved adder using the cell shown in FIG. 25;

FIG. 28 shows schematically a latch that may be used independently of the RAM shown in FIG. 25;

FIG. 29 shows schematically a device where a clock enable signal is used to adjust the effective clock rate;

FIG. 30 shows schematically a multiple MCM architecture allowing fast switching between different states of the programmable matrix;

FIG. 31 is a flow diagram showing the principal operating steps used by a first method for deriving construction data for implementing the device according to the invention;

FIG. 32 is a flow diagram showing the principal operating steps used by a second method for using a library to extract and store construction data;

FIG. 33 is a flow diagram showing the principal operating steps used by a third method for deriving construction data for implementing the device according to the invention;

FIG. 34 is a flow diagram showing the principal operating steps used by a method for implementing the device according to the invention;

FIG. 35 is a flow diagram showing the principal operating steps used by a method for simulating an application using the device according to the invention;

FIG. 36 is a flow diagram showing the principal operating steps used by a method for emulating an application using the device according to the invention;

FIG. 37 is a flow diagram showing the principal operating steps used by a method for using the device according to the invention to facilitate ASIC design;

FIG. 38 is a flow diagram showing the principal operating steps used by a method for avoiding use of faulty cells in the device during implementation of an application using the device;

FIG. 39 is a flow diagram showing the principal operating steps used by a method for fault correction of faulty cells in the device during real-time operation of an application using the device; and

FIGS. 40 and 41 are flow diagrams showing processes according to the invention for hardware implementation of IC chips on electronic cards.

DETAILED DESCRIPTION OF THE INVENTION

FIG. 2 shows schematically the basic architecture of a device 100 according to the invention. The device architecture is a collection of cells 101 interconnected via at least one programmable matrix 102. A cell 103 may be built out of smaller cells 101. Likewise, each of the cells 101 or 103 together with the associated matrix 102 may form part of a block such as 104 and 105. Any block has the same architecture as the whole device 100. Any block can be configured as a single cell. Although any connection can be made between the output of one block and the input of the same or another block, a particular interconnection between the internal cells of two blocks may not always be possible. This allows a block to be associated with a "level" being the number of blocks containing the block. Thus, for example, a block of level 0 is the device itself; blocks of level 1 form the device; and blocks of level 2 forms blocks of level 1. In saying this, it should be noted that FIG. 2 is schematic and whether the programmable matrix 102 is shown within or outside the boundary of the block is immaterial, since in either case all cells within a block must be connected to at least one programmable matrix. The connections of a cell that is internal to a block and enable connection to outside the block, are defined as the port of the block.

It is further to be noted that a block could form a cell or few cells. Likewise, the device 100 is itself a block containing multiple cells interconnected by a programmable matrix and any block thus has a similar architecture of the device 100
and may indeed be regarded as a device. The device 100 thus contains multiple like devices and may be regarded as a cell formed of multiple like cells.

The name "block" allows distinction to be made between the complete device 100 and a component thereof having similar architecture: even though this distinction pertains only to the description for ease of clarity. So far as the claims are concerned, no distinction is made between the complete device and any component thereof having similar architecture. Indeed, an essential feature of the invention resides in the fact that the architecture of a component of the device may be similar to the architecture of the device as a whole. By the same token, since a block is itself a device it can be realized in different ways and thus a device can contain two or more blocks having different structures.

It should be noted that the matrix 102 does not have to be a single entity but can be split into sections. Likewise, it will be seen that the block 105 contains multiple groups of cells of which two are identified as 106 and 107, each containing a possibly different number of cells 101 and both being served by a single matrix 108. The block 105 together with its constituent cells, and any other constituent elements, is also served by a second matrix 109 shown external to the block 105. Each of the matrices 108 and 109 is typically of identical structure to the matrix 102 and whether it is shown inside the block or external thereto is merely a matter of convenience. Thus, the manner in which the matrix is depicted in the figures is schematic for illustration only. It should also be noted that part of the connections available in one matrix might be duplicated in another matrix. Any such duplication will be removed in the implementation. In practice, as will be explained below with reference to FIG. 8, the matrix is simply a collection of switches (such as CMOS switches) each controlled by a Flip-Flop, such that by writing logic "1" or "0" to the corresponding Flip-Flop, the switch may be closed or opened thereby allowing the cells to be connected according to any required topology. The Flip-Flops relating to all the switches of the matrix are arranged in groups and are associated with auxiliary circuitry enabling any one of the Flip-Flops to be selected for the purpose writing data thereto. Thus, the Flip-Flops and associated auxiliary circuitry may be realized by a RAM and will be referred to as "Matrix Control Memory". Optionally, data in the Matrix Control Memory can also be read.

The device architecture shown in may bring to mind "fractal" structures used in mathematics to describe any of a class of complex geometric shapes that exhibit the property of self-similarity.

The input pins and the output pins of the device are connected to the programmable matrix: the input pins to the matrix input; the output pins from the matrix output. In order to access a lower level directly from the input or from the output, the port of the block should be used.

FIG. 3 shows schematically a cell 110 according to a first embodiment. The cell 110 comprises a random access memory (RAM) 111 having (n+m) address lines 112, which are shown as two separate buses although they function a single address bus whose minimum number of address bits (m+n) can be one. A data bus 113 allows data stored in addressable memory locations of the RAM 111 to be read out and accommodates a number of data bits d whose minimum number is also one. Data appearing on the data bus is latched by a latch 114 whose output 115 constitutes an output of the cell 110. The RAM 111 can be loaded with the desired data.

FIG. 4 shows schematically a cell 120 according to a second embodiment. The cell 120 comprises a RAM 121 having (n+m) address lines 122, which are again shown as two separate buses although they function a single address bus whose minimum number of address bits (m+n) can be one. A data bus 123 allows data stored in addressable memory locations of the RAM 121 to be read out and accommodates a number of data bits d whose minimum number is also one. In this case, the data appearing on the data bus 123 constitutes an output of the cell 120. The address appearing on the address buses 122 is latched by a respective latch 124. The RAM 121 can be loaded with the desired data in a manner described below. Although two latches 124 are shown at the input of the RAM 121, they are referred to as "the latch" of the cell 120, no distinction being made to the actual number of latches used to latch the address.

The RAMs 111 and 121 as well as the latches 114 and 124 shown in FIG. 3 and FIG. 4 are part of the device 100 shown in FIG. 2. In a particular embodiment reduced to practice, the RAM was modeled on IDT6116 of Integrated Device Technology, Santa Clara, Calif., USA 95054, and the latch was modeled on SN74HC374 of Texas Instruments Incorporated, Dallas, Tex. USA, 75380-9066. In both configurations of the cell, the values of n, m and d may be assigned as required according to an application to be implemented using the device.

FIG. 5 shows schematically the logical cell of FIG. 4 in more detail. The figure shows a cell 130 comprising a RAM 131 having (m+n)-bit address bus 132 and a d-bit data bus 133. A latch 134a and 134b is used to latch the address on the address bus 132. Again, it is to be noted that the address bus and the latch 134 are shown split by way of illustration only. Functionally, there is only a single address bus and the latches may be considered as a single latch. Logic signals OE and {overscore (OE)} are latched by a latch 136 which also can be considered as part of, or extension, to the latch 134 and fed via auxiliary circuitry 137 to the output enable (OE) of the RAM 131 and may cause the RAM to be in a tri-state condition. Logic signals CS and {overscore (CS)} are also latched by the latch 136 and fed via the auxiliary circuitry 137 so as to allow the RAM 131 to be selected or deselected. The number of pairs of the CS and {overscore (CS)} signals is such to enable mapping the whole block (or device) into a single cell. A clock is routed to the latches 134 and 136 and can be enabled or disabled by a clock enable signal (CE) that may also be routed via a matrix. The signals OE, {overscore (OE)}, CS1, {overscore (CS1)}, CS2, {overscore (CS2)}, CS3, {overscore (CS3)}, CS4, {overscore (CS4)} and so on, and CE are such that when not connected, are set to default values, such that an active low signal is set to LOW and an active high signal is set to HIGH, i.e. to their enabled states. As against this, the default value of WE when not connected is set to its disabled state. When the RAM is not selected (chip select is not active), it is both in tri-state condition and write disabled.

There will now be described a possible timing implementation for the device based on the cell of FIG. 5. (a) A single "master clock" is used for all the cells; (b) There is no "Write" signal as the master clock is the "write" signal. The write is active or not--depending on the "Write Enable" signal; (c) As the "Output Enable" signals and the "Write Enable" signal are also latched by the master clock, therefore is no timing race or conflict between the "Write" signal and any other signal. (d) There is no conflict between the latch operation and the write operation as the beginning of the latch operation is the end of the write operation. This notwithstanding, it may be desirable to decrease the pulse width of the write signal slightly in order to increase the safety margin whilst maintaining the same cycle, which is equivalent to the time between two continuous clock pulses to the latches. (e) When a write occurs, data can be routed to the required cell via the matrix from the input of the device or from other cells. In the latter case, the respective data buses of the two cells are interconnected. The output enable of the RAM to which data is being written is in the output disable state (tri-state), and the output enable of the RAM from which data is being written is active. (f) OE and chip select signals are provided as described above with reference to FIG. 5. The OE signal may be used for write operation, multiplexing between the cells, and so on; and the chip select signals may be used for the creation of bigger cells as is described below with reference to FIG. 6.

For devices suited for applications where no write operation is needed at all, the clock cycle can be shorter to create faster (real-time) applications and the master clock low-to-high transition can occur when the cell output is ready. This will become clearer from the description of a RAM-Server combinations shown in FIG. 15 and FIG. 17 and the timing diagrams shown in FIG. 16 and FIG. 18.

FIG. 6 shows an example for connecting two cells each having an n-bit address so as form a composite cell 140 having double the size, i.e. twice the number of addressable locations addressed by an (n+1)-bit address bus. To the extent that the components of each constituent cell are identical to those described above with reference to FIG. 5, similar reference numerals are used in FIG. 6. Thus, the composite cell 140 contains two RAMs identified as 131a and 131b both having an n-bit address bus 132a and 132b, respectively. Thus, the n least significant bits of the combined address are fed via respective latches 134a and 134b to the corresponding RAMs 131a and 131b. By way of illustration, the (n+1)-bit address fed to the combined cell is derived from a RAM 142 having an m-bit address bus and an (n+1)-bit data bus, an m-bit address being fed thereto via an m-bit latch 143. The data buses 133 of the two RAMs 131a and 131b are connected via the matrix, each data output being tri-state so that only the data on a selected one of the RAMs is output. The MSB of the (n+1)-bit address bus is used to control which of the two RAMs 131a and 131b feeds data to the data bus 133. To this end, it is connected to the {overscore (CS1)} input of the latch 136a controlling the RAM 131a and to the CS1 input of the latch 136b controlling the RAM 131b.

Operation of the circuit is as follows. If the MSB of the data of RAM 142 that is routed to the MSB of the combined (n+1)-bit address is 0, then the RAM 131a is enabled and the RAM 131b is disabled. Conversely, if the MSB of the data of RAM 142
that is routed to the MSB of the combined (n+1)-bit address is 1, then the RAM 131a is disabled and the RAM 131b is enabled. Referring back to the auxiliary circuitry 137 shown in FIG. 5, the CS1 input is fed to a first logical AND-gate 145 whose output is ACTIVE only if all its inputs are enabled. As noted above, any inputs not connected by the matrix are automatically enabled so that the output of the logical AND-gate 145 is ACTIVE if CS1 is enabled and is INACTIVE if CS1 is disabled. Likewise, the {overscore (CS1)} input is fed to a second logical active Low AND-gate 146 whose output is ACTIVE only if all its inputs are enabled (active LOW). Again, since any inputs not connected by the matrix are automatically enabled, the output of the logical AND-gate 146 is ACTIVE if {overscore (CS1)} is enabled and is INACTIVE if {overscore (CS1)} is disabled. Thus, if the MSB is LOW, then {overscore (CS1)}of the RAM 131a is enabled and the RAM 131a is operative and if the MSB is HIGH, then CS1 of the RAM
131b is enabled and the RAM 131b is operative. So when the RAM 131a is ACTIVE, the RAM 131b is INACTIVE and conversely when the RAM 131a is INACTIVE, the RAM 131b is ACTIVE.

In exactly the same way, two RAMs 140 can be combined, in which case the CS2 and {overscore (CS2)} signals are also used for accommodating the two most significant bits of the address. Such extension can be repeated at will to produce a RAM having as many addressable memory locations as required by a specific application. It should also be noted that the two RAMs 131a and 131b are shown in FIG. 6 as having address buses of equal size. However, this need not be the case and an application may, and not uncommonly will, dictate a topology where RAMs having different size address buses are combined.

The "Clock Enable" signal may be considered as an input to the cell, although it is used mainly during design for debugging purposes.

FIG. 7 shows schematically a matrix 150 connecting D cell outputs 151 to A cell inputs 152.

FIG. 8 shows schematically a saturated matrix 155 having four input lines and three output lines that may be used in the circuit of FIG. 7. The matrix 155 has to be able to connect each of the cell output lines 151 to each of the cells input lines 152 in the block. Each cell output 151 is connected to a respective input of the matrix input 155 designated by alphabetic characters a, b, c, d. The matrix 155 serves to allow connection of each cell output 151 to one or more cell input 152
designated by numeric characters 1, 2, 3. There is in practice no reason to have the ability to connect each cell output 151 to all the possible cell inputs 152, thus permitting use of a matrix that is not saturated as shown in FIG. 9. However, it makes the automation simpler to use the saturated matrix 155 as in FIG. 8, and operation is faster in real-time. Although the cell could be made out of smaller cells, and in some applications, the bigger cell would not be constructed, the output and the input of the bigger cell are pre-determined.

Operation of the saturated matrix 155 is as follows. Each of the inputs a, b, C, d is connected to each of the outputs 1, 2, 3 via corresponding switches. Thus, the inputs a, b, c, d are connected to the output 1 via switches a1, b1, c1, d1. Likewise, the inputs a, b, c, d are connected to the output 2 via switches a2, b2, c2, d2; and they are connected the output 3 via switches a3, b3, c3, d3. In order to connect input a to output 1, the switch a1 is closed. In order to connect c to 3, the switch c3 is closed. In order to connect b to both 1 and 3, the switches b1 and b3 are both closed. In order to connect both b and d to both 2 and 3 the switches b2, b3, d2, and d3 are closed, and so on.

Each switch has a control line (not shown) that sets the switch to "closed" or "open" and is connected to a 1-bit memory that stores the state of the switch. As in practice there are great many switches, all the bits that store each switch state are arranged in a memory structure. In other words, there is a memory unit that stores the switches' states. Each bit in the memory is connected to one control line, there being the same number of bits in the memory as the number of switches in the matrix. By such means, the memory functions as a Matrix Control Memory for controlling whether the state of each switch is closed or open. In the above example, the matrix 155 connects a 4-bit data bus to a 3-bit address bus. However, it will be appreciated that the matrix 155 can equally well be connected with the lines a, b, c, d forming the output and the lines 1, 2, 3 forming the input so as to connect a 3-bit data bus to a 4-bit address bus.

FIG. 9 shows schematically an example of a non-saturated matrix 156 comprising a plurality of interconnected saturated matrices 155 as shown in FIG. 8, each having its own memory to control each switch thereof. All the memories are organized as one big memory that functions as the Matrix Control Memory. Programming the matrix is achieved by loading the matrix control memory with the appropriate data, as described below, and sets the desired topology of the device.

Such a matrix 156 constructed so as to have a limited but sufficient number of connections is preferred over an equivalent saturated matrix having the same number of switching connections as it save die space, though the code for choosing the links (routings) is slightly more complicated. Thus, assuming that each matrix 155 is saturated and denoting:

D=the number of input lines to the matrix,

A=the number of the output lines of the matrix,

X=the number of the input matrices 155,

Y=the number of the output matrixes 155, and

Z=the number of the middle column matrixes 155,

X, Y and Z are calculated as follows:

.times..times..function..times..times..gtoreq. ##EQU00001## where "ceiling" denotes that a non-integer number is rounded up to the next highest integer.

Each of the input matrices is connected to each of the middle column matrices. Each of the output matrices is connected to each of the middle column matrices. To prevent cross connects limitations, it is possible to increase Z. Even so, the number of switches and the associated memory will be a lot smaller than the number of switches and the associated memory in a saturated matrix with the same number of input pins and output pins. This is particularly important when a single matrix is used to connect all the cells in all levels. It will be noted that the likelihood that the end-user will connect two cells in the same block is greater than the likelihood that he will connect two cells in different blocks, owing to the tendency to attempt to combine cells to form larger cells. Therefore, when such a single matrix is used, it is advisable to take into account during design of the device to which input and output matrices, pins of the cells are connected, since from these cells the end-user may choose to form larger cells. It should also be noted that if, instead, separate matrices are provided within each block, the cumulative delay for some connections is likely to be greater than if a single matrix were used. Account must also be taken of the need to provide connections in the matrix to the input and output of the device in addition to the interconnections between the outputs of the cells to the inputs of the cells.

In order to understand how the device may be used to implement different hardware applications merely by selecting a required topology and downloading data into the storage elements of each of the cells, various examples will now be described. For ease of explanation, some examples are based on the cell 110 shown in FIG. 3, although the device works in the same manner using the cell 120 of FIG. 4. In the following examples, components that are common to the cell shown in FIG. 3 and the matrix shown in FIG. 8 will be referred to by identical reference numerals.

EXAMPLE 1

Counter

FIG. 10 shows schematically a counter 160 using one cell 110 comprising a RAM 111 having an n-bit data output bus 113 fed to an n-input latch 114, whose output 115, constituting the cell's output, is connected to the input of the matrix 155. Each of the cell's n output lines is connected via the matrix 155 to a respective address line of the RAM's address bus 112. The RAM is loaded with the following data:

TABLE-US-00001 Address Data 0 1 1 2 2 3 . . . K-1 K K 0

In steady state, there is a "number" at the output 115 of the latch 114 that defines the "address" of the RAM 111. Therefore, the "data" of the RAM--the input of the latch--is set by the table. For all addresses apart from the last, the data in any addressable location of the RAM is equal to one more than the address thereof and this becomes the new address on the next clock pulse. Thus, each time the RAM is clocked, the latch 114 latches the address of the next addressable location whose data is equal to the current data plus 1. After a delay time, the RAM is ready for a new clock and the cycle repeats and the output is successively incremented.

EXAMPLE 2

Up-Down Counter

FIG. 11 shows schematically an Up-Down counter 165 substantially identical to the counter shown in FIG. 10, except that the RAM 111 is constructed to store two tables in respective addressable locations thereof. This, of course, requires that the RAM be twice as large as that used in the counter of FIG. 10, or that the range of the counter be half. In either case, one bit of the address is used to set a new area in the RAM for storing data that, when fed to the remaining bits of the address bus, will point to a new address the value of whose data is one less than the address. The new area in the RAM is loaded with the following data:

TABLE-US-00002 Address Data 0 K 1 0 2 1 3 2 . . . K-1 K-2 K K-1

The Up/Down signal is also routed from the matrix 155, as are all the inputs of the cells.

EXAMPLE 3

Delay

In order to achieve delay, the RAM is redundant. Therefore, the RAM 111 can be coded simply to transfer the address to the data e.g. at address "0" the data is "0", at address "1" the data is "1" and so on. On each clock signal, the data at the input of the cell is latched and routed out directly via the RAM, a delay of one clock is achieved. The connectivity of such a delay is also used in the Shift Register described below in Example 4 and shown in FIG. 13.

EXAMPLE 4

Shift Register

FIG. 12 shows schematically a shift register 170 using three cells in a block and comprising respective RAMs 111a, 111b and 111c each having an n-bit data output bus fed to a respective n-input latch 114a, 114b and 114c, whose respective outputs
115a, 115b and 115c combine to form a 3n-bit output data bus 115, constituting the cell's output. The matrix 155 is programmed to connect the (n-1) LSBs of each of the output data buses 115a, 115b and 115c to respective address bits of the corresponding RAMs 111a, 111b and 111c. Likewise, the MSB of each of the output data buses 115a, 115b and 115c to fed by the matrix 155 to the LSB of the next RAM except for the MSB of the data bus 115c, which is simply discarded by the shift register although it may be used by the application. The data in each RAM is coded so as simply to transfer the address to the data e.g. at address "0" the data is "0", at address "1" the data is "1" and so on, as is done in the delay described in Example 3 above. The topology of the shift register is designed to produce the required shift and the matrix 155 is programmed to achieve the necessary connectivity.

It will be understood that the connectivity of the matrix 155 is not shown in FIG. 12 for the sake of clarity. However, in order that operation of the shift register 170 be clear, the connectivity and operation of only the RAM 111a will now be explained with reference to FIG. 13 where identical reference numerals are used to denote those components in FIG. 12.

FIG. 13 shows a one-cell shift register 175 including a RAM 111a having an 8-bit address bus (i.e. n=8), and each of whose data bits is latched by a latch 114a and connected by the matrix (not shown) to the next more significant bit of the RAM's address bus. Thus, denoting the address bits by A.sub.0, A.sub.1, A.sub.2, . . . A.sub.7 where A.sub.0 is the LSB and A.sub.7 is the MSB and the data bits by D.sub.0, D.sub.1, D.sub.2, . . . D.sub.7, the least significant data bit D.sub.0 is connected to the address bit A.sub.1, data bit D.sub.1 is connected to the address bit A.sub.2 and so on. The most significant data bit D.sub.7 is discarded or fed to the next stage of the shift register if several one-cell shift registers are to be connected in cascade as in FIG. 12.

As noted above, the data in the RAM 111a is coded simply to transfer each address bit to the corresponding data bit e.g. at address "0" the data is "0", at address "1" the data is "1" and so on. By such means the data on each address line is simply output by the RAM 111a and latched by the latch 114a. On the next clock pulse, each data bit is now fed by the matrix to the next address line whereby on successive clock pulses, data fed to the LSB address bit a.sub.0 ripples through the shift register.

An alternative approach that may be used to implement a one-cell shift register is to program the matrix to transfer the lines directly, such that D.sub.1 is connected to the address bit A.sub.1, data bit D.sub.2 is connected to the address bit A.sub.2 and so on. In this case, the shift is done by loading the RAM with the following data:

TABLE-US-00003 Address Data 0 0 1 2 2 4 3 6 4 8 5 A.sub.H

and so on. If a reverse shift is required then the RAM is loaded with the following data:

TABLE-US-00004 Address Data 0 0 1 0 2 1 3 1 4 2 5 2

and so on.

It will be understood that in both the 3-cell shift register of FIG. 12 and the one-cell shift register of FIG. 13, the least significant address line is also connected to the matrix. However, it is shown "detached" in both figures in order to emphasize the shift operation.

The following points should also be noted. The maximum rate of this shift register is determined by the clock whose frequency must take into account the delay of one cell only plus the delay of the matrix. Also, with reference to the previous discussion relating to "levels", the input pins and the output pins of the device are connected to the "level 1" programmable matrix: the input pins to the matrix input; the output pins from the matrix output. If the application is implemented in "level
1", the input pin carrying the LSB of the shift register can be routed from one of the input pins of the device, and the output pins 115 can be routed to the output pins of the device. If the application is implemented in other "level" than "1", only the pins from the port of the block in that level can be routed to the I/O pins of the device.

EXAMPLE 5

Noise Generator

A noise generator may be constructed based on the shift register illustrated in FIG. 12. As is known to those skilled in the art, a noise generator may be formed by XNOR-ing some of the outputs of the shift register and feeding the output of the XNOR back to the input address LSB so as to ripple through the shift register. The address-data relation in the RAMs is programmed to achieve the XNOR operation. In the simple case where two data bits of the same RAM are to be XNOR-ed, it is straightforward to program the data in the RAM according to the desired truth table of the device. However, if XNOR has to be performed between data bits of two separate RAMs, then one of those data bits must be connected by the matrix to form an input to the other RAM too. Consider, for example, that in the shift register 170 shown in FIG. 12, one of the data bits of the RAM 111a is to be XNOR-ed with one of the bits of the RAM 111b. The matrix 155 should connect the required data bit in the output
115b of the RAM 111b to input 112a of the RAM 111a and the XNOR result must be passed to one of the data output bits 115a of the RAM 111a. As a result, the RAM 111a has fewer remaining bits to perform the shift function.

EXAMPLE 6

Pattern Generator

A pattern generator/signal generator is another simple application and typically uses one cell as a counter that is routed to a second cell, whose data is programmed to generate the required pattern according to the state of the counter. Thus, considering any function f(t) that can be calculated or plotted, the counter outputs successive values of t, whilst the second cell stores for each value of t fed to its address bus the value of the function f(t). The second cell can be configured for storing in different sections of RAM data relating to different functions, these being selecting as required by a selection code fed to other free address bits of the cell. The counter cell, being programmable, can set different cycles for the different signals (patterns). If the amount of data bits in the cell is less than the number of bits needed, another cell can be used to generate the other part of the signal: one generating the MSB part of the signal and the other generating the LSB part.

Various examples will now be presented using the cell 120 illustrated in FIG. 4.

EXAMPLE 7

Tri-state ability of the cell

FIG. 14 shows schematically a pair of cells 180a and 180b comprising two RAMs 181a and 181b having respective data output buses 182a and 182b both can be driven into tri-state. It is advisable to use the tri-state ability of the cell when multiplexers are needed. In a multiplexer implementation, the outputs of the cells that are to be connected are routed via the matrix to the same point. The RAM 181a has an (n+m) address bus 183a and 183b whose respective address lines are latched by a pair of latches 184a and 184b, whilst the RAM 181b has a k-bit address bus 183c whose respective address lines are latched by a latch 184c. It should be noted the apparent difference between the latch configurations of the two RAMs exists only on paper since k may be equal to m+n and the latches 184a and 184b would then operate as single k-bit latch analogous to the latch 184c. The output enable signals applied to the two RAMs 181a and 181b select which one is active. The output enable signals are routed via the matrix as well.

With reference to FIG. 9, the number Z of the matrices 155 in the middle column is the critical number to prevent cross-connection limitations. However, when the output of some cells are to be connected to the same point, lines may be freed in this middle column. Thus, the cells 180a and 180b can be connected via a matrix in the first column to a common output thereof, allowing the common output of this matrix to be connected to an output of the matrix 156 via only a single matrix in the second column and thus using fewer lines thereof. This is preferable to connecting each of the cells 180a and 180b to respective lines in the middle matri