United States Patent6970966
Gemelli , ; et al.November 29, 2005

Title

System of distributed microprocessor interfaces toward macro-cell based designs implemented as ASIC or FPGA bread boarding and relative common bus protocol

Abstract

A distributed interface between a microprocessor or a standard bus and user macro-cells belonging to an ASIC, or FPGA, or similar silicon devices includes a main module connected to the microprocessor bus on one side and to a COMMON-BUS inside the interface on which a cluster of peripheral modules is appended on the other side. Peripheral modules are also connected to the user macro-cells through as multiple point-to-point buses to transfer signals two directions. A set of hardware and firmware resources such as registers, counters, synchronizers, dual port memories (e.g. RAM, FIFO) either synchronous or asynchronous with respect to macro-cells clock is encompassed in each peripheral module. Subsets of the standard resources are diversely configured in each peripheral module in accordance with specific needs of the user macro-cells.


Inventors:Gemelli; Riccardo (Milan, IT), Pavesi; Marco  (Pavia, IT), De Blasio; Giuseppe  (Rome, IT)
Assignee:Italtel S.p.A. (Milan, IT)
Appl. No.:091530
Filed:March 7, 2002
Foreign Application Priority Data

Mar 15, 2001 [GB] 0106407

Current U.S. Class:710/305 712/242 
Field of Search:710/305,300,22,52 370/423,463,465,257 326/37-41,62 713/501,502 712/10,11,13,15,32,242 709/253 711/104 361/729,737

U.S. Patent Documents
4963768October 1990Agrawal et al.
6122747September 2000Krening et al.
6272151August 2001Gupta et al.
Foreign Patent Documents
0 994 418Apr., 2000EP
2 349 487Nov., 2000GB
Primary Examiner: Ray; Gopal C.
Attorney, Agent or Firm:Birch, Stewart, Kolasch & Birch, LLP

Claims


What is claimed is:
1. An interface between a microprocessor, or a local bus, and user macro-cells, being a macro-cell a self-consistent and pre-verified set of logic elements, generally designed with Hardware Description Languages, and transposed into silicon devices, the interface including a unique module connected between an external bus corresponding to the bus of said microprocessor, or being the local bus, and the remaining part of the interface, via an internal bus of the interface, said remaining part of the interface including peripheral resources, connected to the internal bus, named hereinafter common bus, and controlled by the unique module, named hereinafter main module, for the execution of read/write commands of the microprocessor towards a selected peripheral resource, wherein said peripheral resources are clustered in standardizable peripheral modules located externally to the interfaced macro-cells and connected to the macro-cells through point-to-point buses, being each peripheral module in its turn constituted of a pre-defined set of hardware and firmware resources comprehensive of the most popular needs in interfacing user macro-cells, including: configuration registers, command registers, status registers, not prefetchable registers for trapping events signalled by macro-cells, event counter registers, register synchronizers, dual port memories and FIFO either synchronous or asynchronous with respect to the clock that synchronizes the macro-cells.

2. Microprocessor, or local bus, interface in accordance with claim 1, wherein the pre-defined set of resources are partitioned inside each peripheral module into subsets of homogeneous resources, being each subset optionally equipped in accordance with specific needs of respective interconnected user macro-cells.

3. A microprocessor, or local bus, interface in accordance with claim 2, wherein said subsets of pre-defined set of resources include means for calculating a remote filling status of the associated resources expressed as a number of read/write individual transactions before a resource either becomes empty in a read transaction or full in a write transaction, said means for calculating the remote filling status being arranged to limit the calculated value at an integer greater than or preferably equal to the maximum round-trip delay of the transactions through the common bus expressed in number of master clock cycles.

4. A microprocessor, or local bus, interface in accordance with claim 3, wherein said main module includes means for currently tracing the remote filling status received from a selected peripheral resource in order to compensate locally to the main module the subsequent latency other than the initial delay, and signalling on the external bus either the existence of room for transferring data or the opposite condition to terminate a current transaction.

5. A microprocessor, or local bus, interface in accordance with claim 4, wherein said means for currently tracing the remote filling status received from a selected resource includes in its turn: means for memorizing the received filling status for two consecutive master clock cycles; means for calculating a delta filling status between the received filling status at the current master clock cycle and the previously memorized filling status value, in so detecting variation of residual room either for write or read a datum in/from a not prefetchable peripheral resource; means for calculating a local filling status by summing up the received filling status at the actual master clock cycle with the delta filling status and further subtracting a unity value in case a datum has been transferred from the external bus to the main module during a write or when the acknowledge to a read operation has been received from the common-bus; arithmetic means for saturating the local filling status at the number that can be represented with that bus width; and means for detecting either a positive or null value of the local filling status as conditions for respectively enabling or terminating a current data transfer to/from the external bus.

6. A microprocessor, or local bus, interface in accordance with claim 2, wherein a first subset of said pre-defined set of resources includes a bank of memory registers comprising: configuration registers, command registers, event counter registers, status registers, read-and-reset registers.

7. A microprocessor, or local bus, interface in accordance with claim 6, wherein said first subset of resources further includes: multiplexing and demultiplexing means for selecting a specific register inside the bank of registers and transmitting towards said peripheral controller the register bank's filling status concerning read/write transactions, taken alone, or either accompanied with read or write data word from/to the selected register; and being a served user macro-cell connected to one or more registers of the bank through point-to-point buses for writing or reading the various registers autonomously from the peripheral controller.

8. A microprocessor interface in accordance with claim 7, wherein a not prefetchable read-and-reset register is written from the macro-cells and successively read and reset from the main module in two separate steps that turn the register into a prefetchable one, being a first step devoted to read the register content and transfer it inside the centralized FIFO memory without altering the content into the read-and-reset register, and a second step for clearing up the content previously written into the register upon a condition that the content has been definitely transferred to the external bus agent.

9. A microprocessor interface in accordance with claim 6, wherein when the clock of an interfaced macro-cell, named hereinafter appl_clk, is asynchronous with respect to the common bus clock, named hereinafter cbus_clk, the latter timing the subset of register resources, then suitable synchronization circuits are embedded in corresponding peripheral modules and cascaded to as many registers connected to the asynchronous macro-cells, in order to ensure reliable communication between the two asynchronous clock domains at the two sides of the synchronization circuits.

10. A microprocessor interface in accordance with claim 9, wherein a synchronized status register between two asynchronous clock domains includes: first select and hold means for the selection of either a status word incoming from a macro-cell belonging to the appl_clk clock domain, or a previous stored status word outputted from the same means in the same clock domain; first set-reset flip-flop means for the generation of a control signal of said first select and hold means both enabling the selection of one said incoming status word in presence of a write strobe accompanying the status word and blocking the selection of new status words until the reception of a release command; a first strobe synchronization circuit able to transfer a signal between two asynchronous clock domains for transferring said write strobe towards the cbus_clk clock domain; second select and hold means controlled by the write strobe transferred into the cbus_clk clock domain for selecting, storing, and forwarding towards the common bus either a status word outputted from said first select and hold means or the previous content of the same second select and hold means; and a release circuit for generating the release command for removing the blocked condition settled by said first flip-flop means allowing the acquisition of a new status word incoming from the macro-cell.

11. A microprocessor interface in accordance with claim 10, wherein said release circuit includes: first D flip-flop means for sampling the write strobe transferred into the cbus_clk clock domain; and a second strobe synchronization circuit identical to the first one to transfer the sampled write strobe towards the appl_clk clock domain and obtain said release command.

12. A microprocessor interface in accordance with claim 10, wherein said release circuit includes a second strobe synchronization circuit identical to the first one to transfer a read strobe incoming from the common bus towards the appl_clk clock domain, obtaining said release command.

13. A microprocessor interface in accordance with claim 10, wherein said release circuit includes: first D flip-flop means for sampling the write strobe transferred into the cbus_clk clock domain; a two-inputs select means for selecting either the sampled write strobe or a read strobe incoming from the common bus, obtaining a selected write strobe; and a second strobe synchronization circuit identical to the first one to transfer the selected write strobe towards the appl_clk clock domain, obtaining said release command.

14. A microprocessor interface in accordance with claim 10, wherein said synchronized status register further includes a two-inputs AND gate receiving at one input a signal indicating said blocked condition settled by said first set-reset flip-flop means and at the other input a bit mode for either send back the signal indicating the blocked condition towards macro-cell, or not.

15. A microprocessor, or local bus, interface in accordance with claim 2, wherein a second subset of said pre-defined set of resources includes prefetchable memory resources.

16. A microprocessor, or local bus, interface in accordance with claim 15, wherein said second subset of resources, constituted by prefetchable memory resources, includes a dual port memory having: a first read/write port connected to a read/write port of the peripheral controller for transmitting the filling status concerning read/write transactions and transferring data on the two directions of the common bus; and a second read/write port connected to a user macro-cell through an point-to-point bus for independently transfer data from/to the connected macro-cell.

17. A microprocessor, or local bus, interface in accordance with claim 2, wherein a third subset of said pre-defined set of resources includes not prefetchable memory resources.

18. A microprocessor, or local bus, interface in accordance with claim 17, wherein said not prefetchable memory resource includes: a first peripheral FIFO memory having a read port connected to a read port of said peripheral controller for transmitting the filling status and transferring read data on the common bus, and a write port for receiving write data originated from the user macro-cell through an interconnected point-to-point bus; and a second peripheral FIFO memory having a write port connected to a write port of said peripheral controller for transmitting the filling status and receiving write data from the common bus, and a read port for transfer read data to the user macro-cell through an interconnected point-to-point bus.

19. A microprocessor, or local bus, interface in accordance with claim 2, wherein a third subset of said pre-defined set of resources includes: a modified FIFO peripheral resource, either synchronous or asynchronous with respect to the clock that synchronizes the interconnected macro-cell, coupled to a logic for preventing the read data be overwritten inside the modified FIFO until they are definitely transferred to the external bus; and a not prefetchable memory resource.

20. A microprocessor, or local bus, interface in accordance with claim 19, wherein: said modified FIFO peripheral resource includes a read port connected to a read port of said peripheral controller for transmitting the filling status and read data on the common bus, and a write port for receiving write data originated from the user macro-cell through an interconnected point-to-point bus; and said not prefetchable memory resource includes a known peripheral FIFO memory having a write port connected to a write port of said peripheral controller for transmitting filling status and receiving write data from the common bus, and a read port for transfer read data to the user macro-cell through an interconnected point-to-point bus.

21. A microprocessor, or local bus, interface in accordance with claim 19, wherein said modified FIFO peripheral resource includes a synchronous dual port RAM for writing and reading data under control of an interconnected prefetchable FIFO controller capable to manage the dual port RAM as a circular buffer keeping stored in the RAM and conditionally not overwriteable said data popped out from the modified FIFO read port to be copied into said centralized FIFO memory, until the reception at the modified FIFO read port either a flush or clear-and-flush command, both accompanied with an associated value indicating the number of words to be flushed out because definitely transferred to the external bus, and the clear-and-flush command additionally indicating the end of the current transaction, in a way that the tandem of the modified FIFO and the centralized FIFO is suitable to allow anticipative reads for reducing subsequent latency in burst transactions without losing read data when the centralized buffer input is switched to service another peripheral subset.

22. A microprocessor interface in accordance with claim 21, wherein said prefetchable FIFO controller further includes a pointer-generating and handling logic prompted by said flush or clear-and-flush command and by additional pushword and popword commands to respectively write data into the modified FIFO write port and read data from the read port and consequently setting up: an aligned read pointer for tracing a readable location in the circular buffer adjacent to the location of the last datum overwriteable by effect of said flush command, in such a way that the possible data overwriteable are the only data definitely transferred to the external bus; and a misaligned read pointer for tracing a readable location in the circular buffer adjacent to the location of the last datum copied into the centralized buffer by effect of multiple popword commands, in such a way that in the same transaction no data are read more than once; being the misaligned read pointer aligned to the aligned read pointer location at the reception of the clear-and-flush command; a write pointer for tracing the next writeable location into said circular buffer.

23. A microprocessor interface in accordance with claim 22, wherein said pointer-generating and handling logic includes the following means for generating a rewind command active on the aligned read pointer in concomitance with the execution of each flush command of the current transaction: a popword counter incremented by the popword commands and reset by the flush command; first subtracting means for subtracting the number of flushed out words from the value reached by the popword counting, obtaining a number of rewindable memory locations into the circular buffer whose content is prevented to be overwritten; and second subtracting means for subtracting the number of rewindable memory locations from the current value of the aligned read pointer, in a way to update the offset between aligned and misaligned read pointers.

24. A microprocessor interface in accordance with claim 23, wherein said pointer-generating and handling logic further includes: a first presettable up/down counter for tracing the number of readable locations into said circular buffer, coupled to a supervisor control logic for generating an aligned counting value which counts data available for reading in the circular buffer considering readable data unread from the centralized buffer since last flush command and the rewinded data to the last flush, the aligned counting value being suitable for generating said aligned read pointer; a second presettable up/down counter for tracing the number of readable locations into said circular buffer, coupled to the supervisor control logic for generating a misaligned counting value which counts data available for reading in the circular buffer considering readable only data not yet read from the centralized buffer, the misaligned counting value being suitable for generating said misaligned read pointer; and a third presettable up/down counter for tracing the number of writeable locations into said circular buffer, coupled to the supervisor control logic for generating a writing counting value suitable for generating said write pointer.

25. A microprocessor interface in accordance with claim 24, wherein said prefetchable FIFO controller further includes circular buffer interfacing means connected between said pointer-generating and handling logic and said dual port RAM for receiving both said misaligned read pointer and the write pointer and generate correspondent writing and reading addresses for the dual port RAM, and further to transmit write data and receive read data to/from the dual port RAM.

26. A microprocessor, or local bus, interface in accordance with claim 1, wherein said main module includes means for generating command on the common bus arranged in a way to generate a command consisting of a pure query issued to an selected peripheral resource to return a remote filling status value, or a read-and-query, or a write-and-query, whose argument is a datum coupled with said remote filling status.

27. A microprocessor, or local bus, interface in accordance with claim 1, wherein said main module further includes: a centralized FIFO memory acting as a buffer shared among the peripheral modules for transitorily storing a burst of data read in advance from a peripheral selected subset of resources in correspondence of a command asserted to transfer data on the external bus; centralized FIFO memory control means in its turn including: a) means for calculating the filling status of the centralized FIFO memory in order to verify room to accept new data; b) first counter and comparator means for counting the number of data words transferred from a selected subset of peripheral resources to the centralized FIFO memory and comparing a first counting value issued from said first counter means with a first prefixed threshold placed near the filling boundary of the centralized FIFO memory in order to avoid buffer overflow; c) means for purging the unread data from the centralized FIFO memory at the end of a current transaction towards the external bus in order to assign an empty buffer at the successive requester; and second counter and comparator means for counting the number of data words transferred from the centralized FIFO memory to the external bus and compare a second counting value issued from said second counter means with a parameter indicating the length of a read data burst, in order to detect equality as a condition to terminate a current read burst transaction from the centralized FIFO memory to the external bus.

28. Microprocessor, or local bus, interface in accordance with the claim 27, wherein each standardizable peripheral module, further includes a peripheral controller including in its turn: first peripheral selector means for transferring towards the common bus either read data and the associated remote filling status, or the filling status only, returned from an addressed resource; distributed address decoding means receiving said full-range and lower-range address buses of the common bus to complete the second address decoding step for the selection of a memory port or FIFO port or a register port inside the subset; protocol command decoding and sequencing means connected to the common bus to decode and properly time and forwards commands issued by the main module, and generate for each decoded command a respective peripheral monitoring code suitable to inform the main module about the operational state of the addressed resources; protocol control means cascaded to the preceding sequencing means for generating control and selection signals selectively forwarded to the subset of resources to address them for executing the decoded commands on the addressed resources; second peripheral selector means to forward towards the common bus the transaction requests generated from the macro-cells and selected parameters required by a specific grant cycle involving burst transactions; and means for introducing a prefixed delay between its input signals constituted by the read data, the filling status and the burst transaction parameters, before transferring them on the common bus, being the delay equal to their known initial latency, in order to synchronize them with the peripheral monitoring codes.

29. A microprocessor interface in accordance with claim 28, wherein said protocol command decoding and sequencing means is further equipped with a peripheral partial flow-control-logic for receiving the remote filling status selected by said first peripheral selector means and decode it to return peripheral monitoring code to the main module either: a datum coupled with the value of the remote filling status, the peripheral monitoring code indicating a data-and-query payload, in correspondence of both a read-and-query decoded command and a remote filling status greater than zero; or the only remote filling status null in correspondence of both a read-and-query decoded command and a remote filling status null coupled with the peripheral monitoring code indicating a pure query payload.

30. A microprocessor, or local bus, interface in accordance with claim 27, wherein said two-way transaction control means belonging to the main module includes centralized logic means for synchronizing the transactions between the external bus and the common bus by exploiting sub-means suitable for the following operations: detecting start/end transactions on the two buses; detecting the insertion of wait states on the two buses; detecting availability of data on the two buses; counting the data words transferred to the external bus; sustaining a direct memory access originated by an transaction requester internal to a respective user macro-cell to transfer data from the macro-cell towards a device connected to the external bus, being the transaction requester connected to the remote register subset for transferring parameters suitable to sustain the DMA transaction; storing said parameters suitable to sustain DMA burst transactions; monitoring the number of data currently stored in the centralized FIFO memory; monitoring the operations requested by an external bus protocol agent and promoting corresponding operations inside said main module of the interface; receiving and interpreting peripheral monitoring codes transmitted from the remote peripheral resources; and receiving and interpreting a set of monitoring signals concerned transactions between the main module of the interface and the external bus.

31. A microprocessor, or local bus, interface in accordance with claim 30, wherein said two-way transaction control means belonging to the main module further includes the following control means for co-operating with said centralized logic means: a main common bus sequencer for generating command on the common bus; a main handshake sequencer for generating status signals and commands towards said plug-in means, directed to both the embedded address/command generator and the external bus sequencer; and a main module controller for generating control signals for all the means embedded in the main module of the interface other than the two-way transaction control means.

32. A microprocessor, or local bus, interface in accordance with claim 30, wherein the output of said centralized FIFO memory is further connected to: said first interface circuit means to pop out a datum indicating the source starting address of the reading burst; said address/command generator to pop out a datum indicating the destination starting address of the reading burst; and a direct memory access controller embedded in said centralized logic means to pop out a datum indicating the length of a read burst executed exploiting the centralized FIFO memory.

33. A microprocessor, or local bus, interface in accordance with claim 1, wherein said main module further includes: first interfacing circuit means connected to the external bus for receiving, transmitting, buffering, sampling, or holding address, data and control signals from/to the external bus; second interfacing circuit means connected to the common bus for receiving, transmitting, buffering, sampling, or holding address, data, and control signals from/to the common bus; address translator means cascaded with centralized address decoder means the cascade being connected between said first and second interfacing circuit means to select a peripheral resource for enabling a communication path between the microprocessor, or local bus, and an interconnected macro-cell; address space configuration means, active at the boot time after power up, to supply a granular memory map of the interface to said centralized address decoder means which is enabled consequently to translate the linear address space of the external bus into a structured address space of the common bus according to the physical topology and implemented by forwarding to the cluster of peripheral modules a full-range address bus accompanied with a lower-range sub-bus for the selection of said peripheral modules and embedded resources through a two-steps address decoding, being the first step charged to said centralized address decoder means; internal arbitration means for receiving transaction requests from peripheral modules connected to transaction requesters internal to the respective user macro-cells, and generating a grant signal towards a selected requester to command the requester to transfer parameters suitable to sustain DMA burst transaction; and plug-in means for de-coupling the protocol active on the external bus from the protocol active on the common bus; two-way transaction control means connected to the preceding means and promoting a communication protocol active on the common bus for allowing ordinate transactions through the two directions of the interface.

34. A microprocessor, or local bus, interface in accordance with claim 33, wherein said internal arbitration means is arranged in a way to generate an authorisation signal to enable said plug-in means to compete before an external arbiter for gaining the main module master-ship of the external bus, upon condition that at least one of said transactions request is detected at the input during the current master clock cycle, and for enabling the issuing of said grant signal to a selected requester on the basis of the following two alternative embodiments: at the only reception of said transaction requests; or upon detection of a back confirmation signal stating that the main module have gained the master-ship of the external bus.

35. A microprocessor, or local bus, interface in accordance with claim 34, wherein said internal arbitration means includes: a time slice counter for measuring the duration of a granted service period; a circular queue to push sequentially all service requests coming from user macro-cells; a scheduler to pop out one first request for service in conformity to a policy First-Come-First-Served, being the popped request signaled through the assertion of a correspondent grant signal; a priority ROM to store at the i-nth location a number of clock pulses that a peripheral resource i-nth shall keep the grant asserted; and a data comparator to compare, at every clock time, the counting of time slice counter with the contents of the i_nth row of the priority table corresponding to the actual granted request i, in order to stop the current data transfer in case of coincidence, and command the scheduler to pop out the next request for service at the i+1_nth row.

36. A microprocessor, or local bus, interface in accordance with claim 33, wherein said plug-in means includes the following means: a command decoder for decoding commands received from the external bus and forward it to said two-way transaction control means; an address/command generator for generating address and commands to be issued on the external bus for read/write data at external destination addresses; and an external bus sequencer connected to both the command decoder and the address/command generator suitable to implement the communication protocol active on the external bus and communicate with an external arbiter for requesting the main module master-ship of the external bus.

Description

FIELD OF THE INVENTION

The present invention is referred to the field of the microprocessor interfacing and more precisely to a system of distributed microprocessor interfaces toward macro-cells based designs implemented as ASIC or FPGA bread boarding and relative COMMON-BUS protocol.

BACKGROUND OF THE INVENTION

Nowadays the huge grown in performances and density of gates in modern gate arrays has made possible very complex design implementations. The resulting challenge is to develop fully verified complex designs granting at the same time a short time to market. The nowadays tendency is to manage complexity by exploiting CAD (Computer Aided Design) tools made available on the market to the designer for promoting large net-list integration directly on a silicon die, i.e. Field Programmable Gate Arrays (FPGA), or Application Specific Integrated Circuit (ASIC). A parallel way to dominate the increasing complexity of designs and to allow an easier verification is the nowadays tendency of splitting them in macro-cells which are self consistent and pre-verified design elements that connected as "virtual components" build the system.

Macro-cells can be either developed by the user or bought on the market as Intellectual Properties (IP). The IP are sold as both hard macro and soft macro. Hard macros or hard cores are predefined logic blocks referring to a specific technology (geometric boundaries for logic functions are part of the description) with accurate timing specification that a user can simply drop into a chip. Soft macros or soft cores are predefined portable logic blocks (geometric boundaries for logic functions are not part of the description) not bounded to a specific technology. Soft macros are almost always described using a Hardware Description Language (HDL). The user has to use a synthesis tool to create the gate level representation of the soft core and to target it to a specific technology. Technology vendors sell both hard macros and soft macros while third party vendors are forced to propose soft macros.

At soft-core level a macro-cell can be defined as a self-consistent design element with the following properties: 1. it is able to implement a well defined behavior (macro-cell function); 2. its behavior is generally described by means of Hardware Description Language (HDL) as Very High speed integrated circuits Description Language (VHDL) and Verilog; 3. it can be composed by several primitives (e.g. memories, etc.); 4. it is linked to other components by means of a limited well specified series of interfaces; 5. it is suitable to be technologically remapped without ANY changes in the description; 6. it is pre-verified at logical (no timing) level, that is verified by simulations that it effectively performs the macro-cell function.

A generic known Macro-Cell (MC) is shown in FIG. 1. Primary Inputs (PI) are located on the left side while Primary Outputs (PO) are located on the right side. Those inputs and outputs have to deal with the function performed by the macro-cell, which is the reason why the macro-cell has been designed.

At the topside are present Configuration and Control inputs and outputs used to: configure the macro-cell for proper operation, receive commands indicating operations to perform, monitor the status of the macro-cell to verify proper operation.

All these inputs and outputs need support registers (flip-flops): this means that configurations are memorized into configuration registers, commands are evaluated in command registers, status are memorized into status registers. These support registers are implemented in a portion of the macro-cell named MACRO-CELL LOGIC which hosts the so-called "glue logic" (generic combinatorial and/or sequential logic networks) plus control FSM (Finite States Machines).

A set of memory based devices (pure memory and FIFO (First In First Out) buffers) are located at the bottom side: WRITE FIFO (WFi), READ FIFO (RFi) and MEMORY (Mi). These devices are mainly used to buffer data stream flows transmitted/received from the macro-cell via microprocessor interface or local bus interface. In most of cases these stream data flows are constituted by functional data (data which have to deal with macro-cell function). To control these memory based devices some control FSM are needed, in particular: an FSM based control block for the WRITE FIFO named WRITE FIFO CONTROLLER, an FSM based control block for READ FIFO named READ FIFO CONTROLLER and an FSM based control block for the MEMORY named MEMORY CONTROLLER. These control machines are embedded into the MACRO-CELL LOGIC block. The MEMORY LMj (LOCAL MEMORY j), shown at the topside, is local to the macro-cell. It is used as storage resource for the algorithm performed by the macro-cell; alternatively it implements a FIFO charged to exchange data with another macro-cell or with Primary Inputs or Primary Outputs of the device which hosts the macro-cell. An FSM based memory control block also exists for each LOCAL MEMORY block but is not drawn in FIG. 1.

Is necessary to introduce an implementation note about FIFO controllers. Usually FIFOs are realized by using dual port memories. The circular buffer realized by a FIFO is obtained by means of a circuit named FIFO CONTROLLER that manipulates the memory addresses to implement the circular list. This is the most general situation, in fact built-in FIFOs (FIFO not based on dual port memories with embedded controller) are not diffused, especially in microelectronics. By this reason FIFO CONTROLLERs are very used and also provided as cores from IP vendors. Moreover, several types of FIFO CONTROLLERs exist. A SYNCHRONOUS FIFO (a FIFO where read port and write port are operated with the same clock) can be realized using a SYNCHRONOUS FIFO CONTROLLER. An ASYNCHRONOUS FIFO (a FIFO where read port and write port are operated with different, not synchronous, clocks) needs to be realized an ASYNCHRONOUS FIFO CONTROLLER. Referring to FIG. 1, even if (as we said) in most of cases FIFO are based on dual port memories, by the sake of simplicity, they are not represented as dual port memories plus a FIFO controller but as a FIFO buffer plus a FIFO controller. Even if not represented in the generic known Macro-Cell of FIG. 1 a LOCAL READ FIFO and a LOCAL WRITE FIFO and relative controllers, which can be present.

A macro-cell based design is a design developed by the user as: a set of user developed Macro-Cells, macro-cells bought as intellectual properties IPi, MEMORY resources Mi or LMj, READ FIFO RFi and WRITE FIFO WFi (First In First Out buffers), connected together. This design constitutes a system or subsystem that can be physically implemented either on a board of FPGA or an ASIC.

If short design time and high reliability are possible due to macro-cell oriented design the flexibility of implemented systems is still due to the widespread diffused microprocessor. Microprocessors are employed as both configuration and control processors or also as elaboration processors. Elaboration processors are used in the system to process functional data executing the system software (the elaboration processors belong to several categories: general purpose, micro-controllers, Digital Signal Processor (DSP), coprocessors, etc.). On the contrary configuration and control processor is the microprocessor charged to configure and control the system; it may either be coincident, partially coincident or distinct from other elaboration processors used in the system.

The developed system consisting in a macro-cell based design implemented on a set of ASIC or FPGA is generally placed on a board with at least one microprocessor acting as configuration and control processor. Here a first need arises. To make possible communication between microprocessor and developed components those components need a microprocessor interface. The microprocessor interface is generally build as a macro-cell and embedded in one of the user developed components on the boards; it can either be a user developed or bought on the market. If the developed system is complex, and/or it needs to be modular, and/or easy to maintain, it is split on a set boards. In the simplest implementation one of the boards hosts the main processor while other boards do not. On the contrary, in multiprocessor systems, more boards host a microprocessor. One or more boards grouped constitute a subsystem of the whole system.

In the described case the system is generally constituted by a rack which is a case hosting all the boards and connecting them by a back-plane. A back-plane is a physical media on which is implemented a bus, which is a shared interconnection resource accessed in parallel that allows great modularity. On the back-plane a back-plane bus is implemented. To promote a common design criteria standard back-plane buses are specified; a widely diffused standard is the so-called VME bus (Versa Module European--IEEE-P 1014). Generally boards directly connected to the back-plane are "intelligent", that is each of them hosts a microprocessor. Back-plane buses allows very long buses with slow/medium throughput but the microprocessor is generally required to control data exchange.

In very complex cases further bus based interconnection resources are present. Two main cases arises: each board either has an internal hierarchy consisting in more devices connected by a local bus lied on the board itself, or more boards are connected together by means of a local bus engraved on a back-plane parallel to the one which hosts the back-plane bus. Generally, in the latter case, one of the boards connected together by said local bus is also connected to the back-plane bus and hosts a microprocessor ("intelligent" board) while other boards do not.

Some peculiar, but very diffused architectures, also exists. In small systems, that nevertheless need to be modular and/or easy to maintain, like personal computers, the architecture is based on a main-board and the back-plane bus is not present. The main-board is characterized in that it hosts a main microprocessor and a set of cards (small peripheral boards, generally without processor) connected together and to the main microprocessor by means of a local bus lied on said main-board.

In the large variety of system implementations, another recurrent architecture, based on local bus is present. It is based on a "limited" number boards connected by a local bus lied on a "short" back-plane. If the number of boards to connect exceeds the maximum allowed local bus length and capacitive load then different local buses can be connected by means of local bus bridges.

While back-plane buses are used to connect "intelligent" modules (e.g. boards hosting microprocessors), local buses are generally used to allow communication between peripherals. Generally speaking a peripheral is a "stupid" device, that is, a device which does not embed a microprocessor; peripherals is usually lodged on each board or on the cards which are in turn hosted by a main-board. Again, to promote a common design criteria standard local buses are specified; a widely diffused standard is the so-called Peripheral Components Interface bus (PCI). PCI bus offers a processor-independent data path among peripherals and between the microprocessor and peripherals; said peripherals can be directly hosted on a board, hosted on different boards or on cards hosted in turn on a main-board. Local buses allow very short buses with very high throughput and do not require the microprocessor to control the data exchange (microprocessor independence).

Due to the microprocessor independence, local buses are generally more complex than back-plane buses in terms of protocol and more sophisticated in terms of features. Moreover, several reasons spanning from the historical ones to the implementation complexity, to the convenience of integrating a local bus interface into a peripheral device, promoted the development of several "virtual components" for local buses. Nowadays, master interfaces, slave interfaces and bridges are available on the market as IP (Intellectual Property) in form of hard and soft cores.

Here a second need arises. To make possible communication between different boards (cards) connected to the local bus each card needs a local bus interface; this local bus interface can either be a physical component or a macro-cell embedded into a component developed by the user. In the latter case the macro-cell can either be user developed or bought on the market (IP).

The basic architecture described above is present in a large variety of electronics systems belonging to almost all areas of design: Information Technology, Communication, System Automation, Space and Military Electronics and Automotive. In control area numerical axles control systems are characterized in that each axle has a dedicated control card and all the cards are hosted into the same board. In communication area telephone network switches are characterized in that each end-user has an its own line termination card and a set of line termination cards are hosted by the same module. In information technology area a computer motherboard has several equivalent slots to host cards.

FIG. 2 shows an example of complex bus based multi board architecture. Five boards are present: Board00, Board01, Board02, Board10 and Board11. Boards Board00, Board01 and Board02 are connected together by a local bus named LB0 while other two boards Board10 and Board11 are connected together by a local bus named LB1. LB0 and LB1 are physically separated, that is no communication can take place between them. A back-plane bus BB spanning over all the boards is present. To the back-plane bus BB are directly connected Board02 and Board10. As a consequence boards Board02 and Board10 can communicate directly by means of BB while other boards do not. If boards Board00 and Board01 want to communicate with boards placed on local bus LB1 have to pass through Board02. In the same manner if Board11 wants to communicate with boards placed on local bus LB0 has to pass through Board10.

The architecture of boards is now examined. Board02 hosts a microprocessor MuP02 and a memory bank MM02 connected together by a bus uPB02 of the microprocessor MuP02. The last is the main processor for the subsystem constituted by the group of boards (Board00, Board01 and Board02) and MM02 is the main memory for the same subsystem. The microprocessor MuP02 is interfaced to the back-plane bus BB by a microprocessor to back-plane bus bridge named uP/BB02, also connected to the bus uPB02. A microprocessor to local bus bridge is a device able to allow communication between different protocols: specific microprocessor protocol from one side and a specific back-plane bus protocol from the other side. This means that a microprocessor to back-plane bus bridge has a microprocessor bus from one side and a back-plane bus from the other side. An integrated circuit IC02 connected to the bus uPB02 embeds a local bus interface and is interfaced to the local bus LB0. Board01 hosts an integrated circuit named IC01 (either an ASIC or an FPGA), but the argument is still valid for a set of integrated circuits, directly interfaced with local bus LB0. Board00 hosts an integrated circuit named IC00 and a microprocessor named uP00 locally interfaced with the IC00 via a bus uPB00 of the microprocessor uP00 (the microprocessor can be a coprocessor). The integrated circuit IC00 is further interfaced with local bus LB0. Board10 hosts a processor MuP10 and a memory MM10 connected together by a bus uPB10 of the microprocessor MuP10. The last is the main processor for the subsystem constituted by the group of boards (Board10 and Board11) and MM10 is the main memory for the same subsystem. The microprocessor MuP10, the memory MM10 and an integrated circuit named IC10 are connected together by a microprocessor bus named uPB10. Devices connected to uPB10 can communicate with local bus LB1 via a microprocessor to local bus bridge uP/LB10 that is connected from one side to the uPB10 bus and from the other side to local bus LB1. Moreover, the same side of the microprocessor to local bus bridge uP/LB10 connected to local bus LB1, is connected to a local bus to back-plane bus bridge named LB/BB10 which in its turn is interfaced with the back-plane bus BB. Board11 hosts an integrated circuit IC11 directly interfaced with local bus LB1; a microprocessor uP11 (which can be a coprocessor) is interfaced with the same local bus via a microprocessor to local bus bridge uP/LB11 connected to a bus uPB11 of the microprocessor uP11.

To summarize: the boards belonging to the same subsystem can communicate via the local bus, while boards belonging to different subsystem can communicate via back-plane bus.

Now the detailed architecture of some of the boards shown in FIG. 2 is discussed with the goal to check the needs of each integrated circuit on the boards in terms of microprocessor interface macro-cells or local bus interface macro-cells.

In FIG. 3 the detailed architecture of Board00 (FIG. 2) is shown. With reference to the figure we see that the uP00 processor, with its local ROM (Read Only Memory) and local RAM (Random Access Memory) is directly coupled to the integrated circuit IC00, this imply that the circuit IC00 embeds a microprocessor interface uP INTERFACE block (sketched in the Figure). The macro-cells embedded into the IC00 are configured and controlled by configuration and control processor (the latter for Board00 may be the same local uP00 on the card itself) via said microprocessor interface. Moreover the integrated circuit IC00 can communicate with the rest of the system via local bus LB0, this implies that the IC00 embeds a LOCAL BUS INTERFACE too (sketched in the Figure). Is useful to remind that configuration and control purposes generally do not require long burst transactions (small amount of data are transferred and burst transfers are not required). On the contrary the main purpose of standard local buses is to transfer functional data at high speed (e.g. disk data or video data on a Personal Computer), so data transferred on a local bus generally involves large amount of data in burst mode.

In FIG. 4 the detailed architecture of Board 11 (FIG. 2) is shown. With reference to the figure we see that the uP11 processor, with its local ROM (Read Only Memory) and local RAM (Random Access Memory) is coupled to the circuit IC11, passing trough a microprocessor to local bus bridge uP/LB11. The IC11 is directly coupled with the local bus LB1, this implies that the circuit IC11 embeds a LOCAL BUS INTERFACE (sketched in the Figure). In this case this is the only interface of the IC11 with the rest of the system; as a consequence it has to be used for both configuration and control of macro-cells embedded into IC11 and functional data transfer purposes. This is true in case of configuration and control performed by the main processor on Board10 (Card 10 in this context), or in case of configuration and control performed by local processor on Board11 (Card 11 in this context). Board01 does not have a local processor, so configuration and control of macro-cells embedded into the circuit IC01 hosted by board01 is surely performed by the main processor MuP02 on board02. As a result, as in the case of board11, both configuration and control flow and functional data flow pass through the LOCAL BUS INTERFACE embedded into IC10.

Very often the system clock (the one which clocks macro-cells of the system) and the microprocessor and/or local bus clock differ each others; this especially happens in the communication area. This must be taken in account in both microprocessor interface and local bus interface design. Actually, in that case, said interface has to communicate with different clock domains. As known communication between different clock domains can take place by means of synchronization systems, this argument shall be detailed later.

Another general crucial aspect that involves all kinds of designs involving macro-cells is the availability of drivers. A driver is a small program controlling a specific device based on one or more macro-cells, or part of a macro-cell, on behalf of the Microprocessor Operating System. It constitutes an interface between hardware and high level application software. This argument shall be detailed at the end of the text.

"Classic" Architectures and Related Open Problems

Microprocessor interfaces, local bus interfaces plus FIFOs, synchronizers and some other minor parts, constitutes recurrent solutions. These solutions, characterized from being reusable, are implemented as macro-cells by both end users and companies that sell them as IP (Intellectual Property) macro-cells. In this paragraph "classic" solution in the area of microprocessor interfaces and local bus interfaces will be described and discussed and their drawback evidenced. In current approach microprocessor interfaces and local bus interfaces are realized with different macro-cells. The microprocessor interface interfaces the configuration and control processor from one-side and user macro-cells which need to be configured and controlled from the other side. The number of interfaced user macro-cells is generally high. The microprocessor interface is charged to: 1. interface a configuration and control processor; 2. interface user macro-cells performing: configurations setting (on user macro-cells), commands issuing (to user macro-cell) and status retrieving (from user macro-cells).

The purposes listed at point 2 generally do not require burst access. Moreover microprocessor, in general, are not optimized for burst transfers except for the DMA (Direct Memory Access) mode. On the contrary local buses are generally specified for high performance in burst transfers.

Let's consider a board being a subsystem or the system itself. In the "classic" approach, named for simplicity Centralized Microprocessor Interface (CMI), a unique microprocessor interface block, also named CMI in the successive Figures, is present in the hierarchy of the subsystem constituted by the board. Moreover each block of each user macro-cell in the system which need to be operated by the configuration and control processor is directly interfaced with the centralized microprocessor interface. Point 2 can be seen as a set of services offered to the software running on the configuration and control processor in term of primitives able to operate on the interfaced user macro-cells. To implement these services a certain amount of logic, based on registers (flip-flops) and FSM (Finite State Machines) are required, this constitute a set of hardware primitives as: configuration register, command register and status register. Moreover firmware has to be developed to handle the hardware primitives: this constitute the set of firmware (software) primitives named driver. For the sake of simplicity, hereinafter, all the hardware primitives plus memories and FIFOs that can be interfaced with CMI will be referred as resources. Two topology of interconnection between CMI and user macro-cells are used: 1. A so called Centralized Multi-Port Interface (CMPI) to user macro-cells based on a set of ports, each one dedicated to a specific service like configuration, command and status retrieve. The logic implementing said services is embedded into user macro-cells and developed by users; 2. A so called Centralized Bus Based Interface (CBBI) to user macro-cells based on a bus; services like configuration, command and status retrieve are embedded into user macro-cells, have to be developed by the user in such manner to be consistent with bus protocol.

The CMPI is generally implemented by end-users and it is about custom designs implemented on ASIC (Application Specific Integrated Circuit) while the CBBI, is used from IP (Intellectual Property) vendors in realizing microprocessors interfaces.

FIG. 5 shows an ASIC implementation of the unique microprocessor interface block Centralized Microprocessor Interface (CMI) when it assumes the architecture of a Centralized Multi-port Macro-cell Interface CMPI. The ASIC is organized in four clusters of macro-cells plus the CMI. There are four user developed macro-cells: MC1, MC2, MC3 and MC4 and two macro-cells bought on the market respectively IP1 and IP2. The memory resources are constituted by two local memories LM1 and LM2. The rectangles drawn close to each LOCAL MEMORY block represent the MEMORY CONTROLLERS of each LOCAL MEMORY block. The rectangles drawn into the macro-cells MC1, MC2, MC3, MC4, IP1 and IP2 represents register based hardware primitives which implements set of services offered to the software running on the microprocessor interfaced with the CMI. From the Figure it is evident that each interfaced resource is connected point to point with the CMI.

FIG. 6 shows an ASIC implementation of the block CMI when it assumes the architecture of a Centralized Bus Based Macro-cells Interface (CBBI). The situation is the same described in FIG. 5 but all the resources are connected to CMI via a bus CMI_bus: this is a more flexible solution with respect to the Centralized Multi-port Macro-cell Interface (CMPI).

The drawbacks of CMI architecture are: 1. The user is forced to design hardware primitives "ad hoc" for the specific application. 2. In the same way macro-cell drivers are designed as a single unstructured code, this way any new application specific device requires a new relative driver designed from scratch. 3. Being hardware primitives, embedded in user macro-cells, when macro-cells implementing the microprocessor interface CMI change, a certain amount of redesign of hardware primitives embedded in user macro-cells is required to allow interfacing with the new CMI. 4. The unique microprocessor interface, block CMI, is designed ad hoc for the particular microprocessor interfaced and there is not result in the art that effective circuital facilities be provided to simplify a possible change of microprocessor type. 5. A certain redesign of user macro-cell is also required when one of the two different clock domains changes, this is due to the embedding in user macro-cells of their clock domain side of the synchronization circuit. 6. The architecture is feasible in case of chip implementation but not in case of multi-chip board implementation. This is manifest for the CMPI topology of interconnection with user macro-cells. In FIG. 5 is shown the ASIC (Application Specific Integrated Circuit) implementation of a CMI with CMPI topology. An equivalent FPGA bread-boarding implementation of this ASIC is realized replacing each "cluster" of macro-cells and the block CMI itself with an FPGA and all devices are hosted on a board. Being, in case of CMPI topology, the number of ports on the CMI equals to the number of hardware primitives connected to it, the number of pins required by the CMI macro-cells can exceed the number of pads of the FPGA charged to host the CMI.

Points 1 to 4 lengthen the design phase impacting on time to market. Point 5 impacts on portability IC to board and vice versa. Nowadays the last aspect is particularly important for rapid prototyping of systems. Rapid prototyping of systems consists in realizing a prototype of a system via FPGA bread boarding. This is useful to explore the correctness of a system's architecture before the production of the system starts. In general for large productions the system will be finally integrated into an ASIC following a strategy of System On a Chip (SOC). "Classic" architectures very often are not compliant to this requirement of portability between ASIC implementation and an FPGA bread boarding implementation.

The macro-cell which implements local bus interface, interfaces the local bus from one side and user macro-cells which need to transmit and/or receive stream data from the other side. The number of interfaced user macro-cells is generally low. The local bus interface is charged to: 1. interface the local bus; 2. interface user macro-cells performing: transmission and/or reception of stream data flows to/from user macro-cells. The main purpose of local bus interfaces is expressed at point 2
even if in many designs it is also used to perform functions typical of microprocessor interfaces (as in Board01 and Board11 of FIG. 2 or in FIG. 4); by this reason a third point can be added: 3. interface user macro-cells performing: configurations setting (on user macro-cells), commands issuing (to user macro-cells) and status retrieving (from user macro-cells).

Let's consider a board being a subsystem, or the system itself. In the "classic" approach, named for simplicity Centralized Local Bus Interface (CLBI), a unique local bus interface block, also named CLBI in the following Figures, is present in the hierarchy of the subsystem constituted by the board. Moreover each block of each user macro-cell in the system which need to be operated by the local bus is directly interfaced with the block CLBI via a bus named CLBI_bus in the following Figures. As in the case of microprocessor interface, the point 3 above can be seen as a set of services offered to the software running on an agent which controls the local bus, in term of primitives able to operate on the interfaced user macro-cells. All the considerations done in the case of microprocessor interface are still valid, i.e. when a local bus interface is used for configuration and control purposes, the list of drawbacks of "classic" solution is the same described for CMI interface.

Also the latter point 2 can be seen as a set of services offered to the software running on the agent which controls the local bus, in term of primitives able to operate on the interfaced user macro-cells. To implement these services a certain amount of logic, based on memory and FSM (Finite State Machine) are required: this constitutes the set of hardware primitives (READ FIFO CONTROLLER, WRITE FIFO CONTROLLER, and MEMORY CONTROLLER). Moreover firmware has to be developed to handle the hardware primitives: this constitute the set of firmware primitives named drivers. As in the case of block CMI, for the sake of simplicity, hereinafter, all the hardware primitives plus memories and FIFOs that can be interfaced with block CLBI will be referred as resources.

The following solutions present on the market are consistent with the CLBI architecture but differs in the topology of interconnection between CLBI block and user macro-cells: 1. A so called Centralized Multi-Port Interface to user macro-cells (CMPI) based on a set of ports, each one dedicated to a specific service. The services are embedded into user macro-cells and developed by users. Several different implementations are possible; 2. A so called Centralized Bus Based Interface (CBBI) to user macro-cells based on a bus; services, embedded into user macro-cells, have to be developed by the user in such manner to be consistent with bus protocol.

FIG. 7 shows an ASIC implementation of the unique local bus interface Centralized Local Bus Interface (CLBI) when it assumes the architecture of a Centralized Bus Based Interface (CBBI). The ASIC is organized in four clusters of macro-cells plus the block CLBI. There are four user developed macro-cells: MC1, MC2, MC3 and MC4 and two macro-cells bought on the market, respectively IP1 and IP2. The memory resources are constituted from two local memories LM1 and LM2, a READ FIFO RF1, a WRITE FIFO WF1 and a MEMORY M1. The rectangles close to each memory resource represent as many controllers of respective memory resources. The READ FIFO CONTROLLER drawn closed to RF1, the WRITE FIFO CONTROLLER drawn closed to WF1, and the MEMORY CONTROLLER drawn closed to M1 represent FSM based hardware primitives which implement a set of services offered to the software running on the agent which controls the local bus. The rectangles drawn into the macro-cells MC1, MC2, MC3, MC4, IP1 and IP2 represent register based hardware primitives which implements set of services offered to the software running on the agent which controls the local bus. From the Figure it is evident that each interfaced resource is connected to the bus CLBI_bus.

On the market are present several IP interfaces that exhibit the CMPI topology of the point 1, they are listed in the following points 1a, 1b and 1c. Point 1a concerns a centralized interface to user macro-cells based on a read port (FIFO based) and a write port (FIFO based) and no address bus available, like in a bridge. Point 1b concerns centralized interface to user macro-cells based on a memory mapped i/o. Two dual port RAM are used to exchange data to and from user macro-cells which are mapped to the memories. Point 1c concerns a more structured solution combining solution 1a or 1b for burst transactions with a bussed interface like CBBI for configuration and control purposes.

FIG. 8 shows an ASIC implementation of the architecture presented at point 1c. The CMPI (Centralized Multi-Port Interface) interconnection topology is used for a burst read port and a burst write port, while the CBBI (Centralized Bus Based Interface) interconnection topology is used for a bussed port (no burst capable) devoted to configuration and control of macro-cells. The ASIC is organized in four clusters of macro-cells plus the CLBI (Centralized Local Bus Interface). There are four user developed macro-cells: MC1, MC2, MC3 and MC4 and two macro-cells bought on the market respectively IP1 and IP2. The memory resources are constituted from two LOCAL MEMORIES LM1 and LM2, a READ FIFO RF1 and a WRITE FIFO WF1. RF1 and WF1 are respectively connected to the burst read port and to the burst write port. The rectangles close to each memory resource represent the controllers of each memory resource. The READ FIFO CONTROLLER drawn closed to RF1, the WRITE FIFO CONTROLLER drawn closed to WF1 represent FSM based hardware primitives which implement a set of services offered to the software running on the agent which controls the local bus. In their turn, the rectangles drawn into the macro-cells MC1, MC2, MC3, MC4, IP1 and IP2
represent register based hardware primitives which implements set of services offered to the software running on the agent which controls the local bus. All the register based hardware primitives are connected to the bus CLBI_bus.

In conclusion, architecture CMPI (Centralized Multi-Port Interface) of the point 1 is related to specific applications (like a bridge). On the contrary architecture CBBI (Centralized Bus Based Interface) of the point 2, even if more complex, is the most general and versatile, especially if many user macro-cells have to be interfaced.

The architectures at the points 1a, 1b and 1c are oriented to interface up to two applications capable of burst (one in read and one in write). When more burst capable applications are involved, the application's designer has to grant the contended access to the unique interface.

The architectural drawbacks of CLBI (Centralized Local Bus Interface) topology are the same exposed in the case of CMI (Centralized Microprocessor Interface). The only difference is that in this case resources interfaced with CLBI may be different from the ones interfaced with CMI. More precisely, when CLBI is employed for the goals exposed at point 3 the interfaced resources embedded in the macro-cells are the same as in case of CMI, while when CLBI is employed for the goals exposed at point 2 the interfaced resources are FIFOs and memories.

All the architectures discussed above can be used combined in applications; for instance IC00 of FIG. 3 (on Board00 of FIG. 2) uses a CMI macro-cell and a CLBI macro-cell, while IC111 of FIG. 4 (on Board11 of FIG. 2) uses only a CLBI macro-cell.

Until now drawbacks of the known interfacing architectures have been outlined concerning hardware implementation of the macro-cells; these architectural drawbacks also reflect into restrictions on the hardware-related protocol which governs the transaction on the CLBI_bus. In particular the interfacing of resources in the known architectures at the points 1a, 1b and 1c seems to be quite rigid. To say that transactions from the unique block CLBI toward the complex of interfaced macro-cells, and vice versa, requires a great deal of design dedicated to the FSMs embedded in the macro-cells to cope with the various interfacing transactions, this makes the interfacing protocol design very cumbersome and not portable.

The lack of a modular structure in the known interfacing architectures also reflects into a similar lack in the software design of drivers for specific devices based on macro-cells. In fact drivers of the known art are generally designed as single unstructured codes in the same way as the code of the respective applications (macro-cell devices), which are historically designed as belonging to a single entity (CMI). This way any new device requires a new driver designed from scratch. Contemporary hardware project style, oriented to the use of reusable functional blocks (macro-cells) should permit a more rational project style for device drivers too. Nevertheless, nowadays device drivers are written again in the traditional way. A plausible explanation is the absence in the art of a modular and well-structured microprocessor-to-macro-cells interface able to stimulate a new software design for drivers.

A serious attempt to reduce time-to-market and allows maximum subsystem re-use in systems that span a wide range of performance characteristics, is disclosed in U.S. Pat. No. 5,948,089 (Sonics, Inc.). The relevant claim 1 recites textually: "A computer bus system comprising: a synchronous bus operative during a number of bus cycles, said number of bus cycles divided into recurring frames, each frame further divided into packets comprising at least one clock cycle; at least one initiator subsystem coupled to the bus, the at least one initiator subsystem configured to have at least one packet pre-allocated to the at least one initiator subsystem, said initiator subsystem configured to send out a request during a clock cycle within the at least one pre-allocated packet, said request comprising a command and further comprising an address of a target subsystem; at least one target subsystem, said target subsystem configured to receive the address of the request and determine if the address corresponds to an address of the target subsystem, wherein if the address of the request corresponds to an address of the target subsystem, said target subsystem responds to the request on a second clock cycle."

In the introductory part of the cited patent document it is clearly said the computer bus to work in such a way to de-couple the frequency of the bus from the operating frequencies of the various client subsystems. In such a way each subsystem may operate based on its own requirement, and the subsystem interface modules needn't to be redesigned when the operating frequency of the bus is increased. To meet said requirements, the system visible in FIG. 1 of the citation substantially discloses a fully pipelined fixed-latency communication system based on a computer bus. Said computer bus is shared among various initiator/target subsystems in which initiators have the capability to act, in turn, like a master (or slave) while targets are always slaves. Because of the shared resources, the problem of subdividing transmission bandwidth among various initiator and target subsystems arises. Sonics' invention solves the outlined problem by importing in the computer bus world some solutions well known from ATM (Asynchronous Transfer Mode) networks. Those networks implement a protocol suitable to asynchronously transfer serial packets (cells) to/from various nodes of a telecommunication network. Packets are made of fixed number of serial octets of bits queued into relevant sending/receiving buffers, respectively locate at the two sides of a switching matrix that provides for routing. At this purpose packets have a header for the relative identification by means of a label, other than an information field available for the user need. The header also includes further information that pertains to the ATM layer functionality itself, for example for bandwidth control. A communication computer inside each node manages the packet consumption at the various queues by implementing a policy which takes into account the bandwidth requirements of the different users, in term of bit-rate. To do so an protocol negotiation phase is foreseen in which the guaranteed bandwidth is set at first, then the residual bandwidth is distributed among the various requesters by means of a token mechanism. Communication media supporting serial ATM packets are either physical carriers, like optical fibers or coaxial cables, or radio connections for digitally modulated serial data. As known from techniques concerning serial transmission, a framed timing structure of the bit-stream is needed for synchronization aims. So in synchronous transport layers for ATM, such as STM-n streams (Synchronous Transfer Mode-n) belonging to SDH (Synchronous Digital Hierarchies) links, the ATM asynchronous cells are fitted into SDH frames and therein synchronized by exploiting cell multiplexing/de-multiplexing provisions. The framed packet feature is also reproduced in the Sonics' invention, as clearly recited in the claim 1.

From the above arguments it can be argued that Sonics' on-chip computer bus system is something more than a computer interface: it seems to include all the relevant features of a truly communication network interface (ATM, LAN, Token Ring, etc). The only difference between Sonics' invention and classic communication network interfaces is that in the second case interfaces are connected to a physical media suitable for serial transport (coaxial cable, optical fiber), while in the first case the interface uses a modified computer bus made of parallel metallic paths inside a chip or, at most, extended in the boundary of a board. In conclusion, Sonics' computer bus adopts a distributed architecture managed by a mixed criteria pertains both to the token ring and ATM networks, for the precise aim of promoting re-use of the silicon subsystem designs and reduce on-chip devices time-to-market.

In the Applicant's opinion different solutions are possible to reach the same goals (i.e. reuse and lower time-to-market), without forcing a designer to implement complex features typical of a communication system interface instead the simpler features of a processor interface, in case only the last is needed. Framing and packetization of bus cycles, successive storing of information concerning bandwidth requirements of the multiple subsystems and respective selection by two arbitration levels, turn out to be additional demanding features whether only a processor interface towards a plurality of target subsystems is really implemented.

From a logic point of view a boundary should anyway exist between a true communication system interface, more suitable for a computer network, and a simpler processor interface. A processor interface generally deals with transactions between a single microprocessor and a plurality of target devices appended to the processor bus. Target devices interface the processor either directly or preferably indirectly by means of a common module connected to a standard processor bus, at one side, and to a proprietor bus and/or point-to-point link towards all the target devices, at the other side. Many examples have been already discussed above speaking about the prior art architectures.

By comparison communication network interfaces (as per the Applicant opinion the Sonics' computer bus) exploit communication media in order to extend communication facilities to a plurality of processors and devices. In the framework of communication networks the most relevant problem to be solved at the interface level is that of how to regulate multiple accesses to the common media from the various contenders, in order to both avoid conflicts and meeting different bandwidth requirements. This problem, anyway important, is not as much pressing in a simpler processor interface and can be solved by means of traditional arbitration methods like round-robin one, in case modified as per a variant of the present invention to improve performances with a distributed architecture. Contrarily to Sonics' communication system and to computer network interfaces in gender, a processor interface takes great advantage from burst transactions. Burst is a sequence of bus transactions occurring on consecutive bus cycles and implying address increment or decrement. Resort to burst transactions in processor interfaces having a distributed architecture needs the solution of some incoming problems. The Applicant's invention solves these problems by means of a so-called "PREFETCHABLE FIFO" which extends burst opportunities to the distributed resources embedded in the processor interface. Bursts transactions are quite inapplicable in communication systems like the Sonics' invention, for the reason that the mechanism that supports bursts is inconsistent with mechanisms for distributing bandwidth through a particular policy of the accesses. More precisely the more bursts are long the more they are profitable for reducing subsequent latency (this argument will be detailed later); so by using long bursts or by frequent recourse to shorter bursts the TDMA (Time Division Multiple Access) method taken to fair distributing bandwidth is paralyzed. It's useful to remind that Sonics' invention implements two level arbitration scheme where the first level of arbitration is a framed time-division-multiplexing arbitration scheme and the second level is a fairly-allocated round-robin scheme implemented using a token-passing mechanism: to say two TDMA methods.

The main shortcoming of the Sonics' communication system has been outlined, that is to be unable to improve designs reuse and reduce time-to-market of relevant on-chip devices without introducing technical features typical of computer network interfaces. In application addressed to usual microprocessor interfaces those features should appear like additional and too binding ones.

PURPOSES OF THE INVENTION

The main purpose of the present invention is that to indicate a microprocessor, or local bus, interface having a modular distributed architecture towards user macro-cells encompassing a standardizable set of resources variously configurable in accordance with the needs of the macro-cells.

Strictly related purpose of the invention is that to indicate a PREFETCHABLE FIFO which exhibits to a certain extent a prefetchable behavior in order to allow anticipative reading for speed up burst transaction in the distributed interface even when it is interfaced to not prefetchable resources.

Consequent main purpose of the invention is that to indicate an interface bus protocol able to manage in a transparent way the round-trip latency of communication between microprocessor and resources in order to avoid subsequent latency other than the initial delay and optimize burst transactions through a bus of the interface continuously shared among distributed resources. Moreover said protocol is able to manage the PREFETCHABLE FIFO in order to allow anticipative read also when not prefetchable resources are involved.

Derived purpose of the invention is that to further specialize a main block of the distributed architecture in order to de-couple, as far as possible, the main block of the interface towards changes in the external microprocessor.

Another purpose of the invention is that to exploit in a profitable way a two clock domains synchronizer circuit, developed by the same Applicant, in the new context of the distributed interface.

Further purpose of the invention is that to indicate a software facility for writing driver code of devices based on user macro-cells interfaced with the distributed microprocessor interface in subject. Major benefits are driver reusability when a certain macro-cell has to be reused in a different context and easiness of development of new drivers starting from a driver skeleton referencing the standardizable set of resources indicated above.

Accordingly, the present invention overcomes all the restrictions of the prior art interfaces, in particular that concerning the need of a large "ad hoc" design or redesign of the hardware and firmware resources embedded in the user developed macro-cells for the only purpose to interface the configuration and control microprocessor and not to the specific application of the macro-cell.

SUMMARY AND ADVANTAGES OF THE INVENTION

To achieve said purposes the subject of the present invention is an interface between a microprocessor, or local bus, and user developed macro-cells having the modular and distributed architecture described in claim 1.

Main module and peripherals modules of the distributed interface of the present invention may be advantageously implemented like further macro-cells.

The distributed interface of the present invention has a great impact on the new user developed macro-cells. Having in fact removed from the corresponding user developed macro-cell of the prior art the majority of hardware primitives only devoted to interfacing the configuration and control microprocessor, a new simpler macro-cell basic architecture results which not needs the "ad hoc" interfaces design or redesign of the prior art. The transfer of all those above mentioned resources into configurable peripheral modules of the microprocessor distributed interface allows a great uniform design of all genders of macro-cells, indifferently belonging to the user or the interface itself. The development and interfacing of user macro-cells is made easier consequently and the designer can concentrate prevalently on the macro-cell function.

Another advantage of the distributed interface of the invention, due to its high modularity, consists in furnishing a scalable architecture. It is in fact convenient to provide the interface with the only peripheral modules whose resources are effectively exploited by the complex of user macro-cells.

Moreover the distributed interface, thanks to the presence of a COMMON-BUS, has a number of interconnections between the main and peripheral modules not excessively high; that makes the structure suitable for FPGA bread-boarding implementation. Because the proposed architecture is the same for both ASIC and FPGA, no changes are necessary changing the implementation technology, a rapid prototyping of an ASIC under development is possible consequently in order to explore the correctness of the design and speed up the time to market.

The distributed interface disclosed in the claim 1, besides the standardizable peripheral resources, includes means expressly designed to optimize the data throughput across the overall interface. Efficient burst transactions are promoted by including in each peripheral module a filling status calculator indicating residual room in the selected resource either for writing or reading, and by introducing in the main module companion means for elaborating the remote filling status received from a selected peripheral resource in order to compensate the subsequent latency other than the initial one. The presence of these means greatly helps the development of an efficient interface bus protocol, as it will be seen in the following.

A peculiarity of the interface in subject, largely derived from the use of filling status dedicated means, is the capability to conjugate a modular distribute architecture with skill in burst transactions, despite the fact that a COMMON-BUS shared among a plurality of resources force the main module to take initial latency whenever a selected resource changes by effect of arbitration. The filling status dedicated means entirely support and prompt the interface bus protocol in the task of optimizing write burst transactions. Concerning the optimization of reading bursts (even when not prefetchable resources are involved), supplementary means are provided, namely a centralized buffer memory and a PREFETCHABLE FIFO, as disclosed in the claims.

The buffer in the main module allows anticipative read of a selected resource, both prefetchable and not, until the buffer is near full. In this way the interface can be profitable employed for interfacing the widely diffused local buses obeying to Information Technology (IT) protocols, which require a backward acknowledge signal at each singular datum read in the buffer by the external processor. At the end of a current transaction the data unread from the external microprocessor or generic bus master are purged from the buffer to assign an empty buffer at the successive requester. Being the buffer a traditional FIFO, the purged data should be definitively lost if also the peripheral resource connected to the buffer were a not prefetchable traditional FIFO. This is not true for a PREFETCHABLE FIFO expressly developed to be used as peripheral in tandem with a centralized buffer inside the main module. This is possible because the PREFETCHABLE FIFO simply rewinds its data unread from the processor or in generic bus master into the buffer, consequently the only overwritten data in the PREFETCHABLE FIFO are those effectively read from the processor.

Thanks to the combination of means like the filling status devoted ones, the centralized buffer, and the PREFETCHABLE FIFO, and a smart protocol able to exploit synergies, the interface of the invention reaches optimum efficiency in burst transactions both in read and write, both with prefetchable and not prefetchable resources. In that the advantage of the distributed architecture of the present interface is fully exploited.

Further advantageous architectural characteristic of the interface is that to have a second level of modularity inside the main module, as disclosed in the dependent claims. To meet the second level of modularity, main module consists of a centralized circuit which controls various specialized circuits largely decoupled each other. The centralized circuit collects most of the relevant events manifested on the two directions of the COMMON-BUS and on the external-bus, also collects the grant signals generated by an internal arbiter and in various significant points of the circuit. This further enhanced architecture of the interface makes easier to change an external microprocessor or generic bus master without affecting the user macro-cells and the distributed interface other than a command decoder dedicated to cope with the new microprocessor and a sequencer acting on the external-bus. Differently in "classic" solutions which admit bus the absence of a centralized circuit which governs separate functional parts force the designer to a large revision of the bus controller at the two side of the interface, to say in the unique block and inside the user macro-cell.

Another subject of the invention is an interface COMMON-BUS protocol which makes operative the interface of the claim 1, as disclosed in the respective independent claims. The protocol of the invention works optimally with burst transaction because of its insensibility versus the subsequent latency, contrarily to the majority of the known protocol. The goal is reached by tracing the filling status of remote peripheral resources locally to the main module. The tracing is performed both at the start and continuously during a read/write burst transaction, at the precise aim of anticipating locally to the main module the filling status as calculated from a remote resource (remote because of the latency). In this way the only latency in the transaction remains the unavoidable initial round trip delay. Advantageously for tracing the filling status the protocol has at its disposal a pure query command to catch the initial delay and read/write command coupled with a query to have returned on COMMON-BUS the remote filling status of the selected resource in the meanwhile read/write operations taken place. This method is necessary because while a microprocessor or generic bus master access to a peripheral resource is immediately known to the main module which manages the interface protocol and executes the filling status algorithms, a macro-cell access to the same resource becomes known to the manager only after the latency between the peripheral and the main module is elapsed.

For correctly anticipating the remote filling status locally to the main module, a first specialized algorithm trace the variation of the remote filling status of a not prefetchable resource due to possible read/write accesses of the interconnected macro-cell independently from the microprocessor master accesses.

A second specialized algorithm currently updates the first remote filling status received after the initial latency is elapsed by summing up the traced variation and subtracting a unity value each time a datum is transferred between the main module and the external bus master and the main module (when the external bus master writes data into the main module) or a read command is issued on the COMMON-BUS (when the external bus master reads data from the main module). In this way the updated filling status is a local image of the remote filling status (as far as possible precise). Said local filling status is put at the disposal of the protocol to take immediate decisions concerning prosecution or termination of a burst transaction, without shortcoming of crossing the boundaries of the not prefetchable resource actually selected.

The precision of the remote filling status anticipated locally to the main module is based on two assumptions. A first one is that the datum written from the external bus master to the main module is considered effectively transferred when the handshake for it has taken place (at the interface between the external bus master and the main module). In a similar way the datum read from a macro-cell connected to a peripheral resource is considered effectively read when the handshake for it has taken place (at the interface between the external bus master and the main module). A second assumption is that the filling status locally anticipated in the main module is equal or pejorative with respect to the real one. This implies that possible variations in the remote filling status of the resource due to the interconnected macro-cell do not produce overflow or underflow of the resource when the datum is on the fly to/from the resource. Both the assumptions are generally satisfied. In fact, firstly, when the command to read/write a datum is put on COMMON-BUS it is executed without suspensive conditions except in case of failure. Secondly, the variations in the remote filling status due to an interconnected macro-cell can only increase its value both in read (more data to read) and write (more room to write data).

Furthermore, for question of convenience, the filling status is saturated at a value greater or equal to the maximum round-trip latency that a transaction kept on COMMON-BUS, being the value of the latency approximated at the upper integer. Advantageously the binary value expressing filling status saturated as indicated above, allows a precise visibility of the residual room on the resources either for writing or reading in the time window of the round trip latency, avoiding to exploit unnecessarily wider buses.

The modularity of the protocol matches the modularity of the hardware, consequently the various algorithms composing the protocol are easily assigned to as many separate circuit parts, both concerning main and peripheral modules, although mostly of the algorithms are charged to the main module. The protocol itself can be seen like a plurality of algorithms able to run concurrently and variously combined together by a supervisor algorithm to cope with different type of transactions arising from different cases of mastership of the external-bus in write or read. This arrangement allows great advantages in terms of reusability of the design. The advantages of the hardware architecture are reflected in the protocol that, contrarily to known ones, charges minimally the macro-cells.

The philosophy of the distributed interface in subject is that to provide interfacing resources externally to the user macro-cells. In line with this philosophy user macro-cells should be discharged also from the synchronization jobs. A synchronization problem arises when macro-cells belong to a clock domain different from that of the interface one. In case a peripheral resource interfacing macro-cells is a dual port RAM, or a FIFO, synchronization is generally carry out by the respective controller in a known way. On the contrary when peripheral resources are registers the known art synchronizers need some ad hock logic embedded in the macro-cells, other than the interface logic. As far as synchronization concerns, the finalities of the distributed interface in subject are achieved with peripheral resources of register type by exploiting a separate invention subjected to a patent application in the name of the same Applicant. A two clock domains synchronization circuit is disclosed in that application. The circuit has been expressly designed to be simply interposed between the two domains to be synchronized. It descends that ad hock logic embedded in the macro-cells is no more necessary. Thanks to the owned peculiarity such a synchronizer becomes part of the distributed interface, from an architectural point of view, that because it offers to an interfaced macro-cell a synchronization service complementary to the service offered from the resource itself. Consequently any possible complication in adapting the disclosed synchronizer to the various type of registers is a matter confined inside the interface. The synchronization circuit can be advantageously seen as an independent element of a HDL list (Hardware Description Language) able to increase any more the modular design of the distributed interfaces when it operates in asynchronous way.

Further subject of the invention is a software facility for writing driver code for devices based on user macro-cells interfaced to a microprocessor through the distributed microprocessor interface of the claim 1, as disclosed in the respective independent claims.

The software facility is directed to a map file for assigning a different physical address to each symbolic address included in a declaration part collecting arguments of all the basic Functions building up the driver macro-cell by macro-cell. The modular architecture of the distributed interface helps in writing absolute addresses of map file when the symbolic driver shall be multiple allocated or relocated. This makes free hardware designers to deliver drivers in symbolic code (no address binding performed) and test it after relocation, increasing the portability of the drivers themselves. Summarizing, advantages of said software facility consist in the independence from the operating system specific calls and in the portability of the same drivers in different hardware (Integration platform of reused macro-cells) and software (Operating System) environments. Said features are obtained by writing the low level drivers in terms of O.S. independent calls referencing directly the standardizable peripheral resources introduced above and by providing a separate map file to provide address binding for the selected environment. Then the low level drivers will be incapsulated from calls of the specific O.S. used in the application.

BRIEF DESCRIPTION OF THE DRAWINGS

Further objects and advantages of the present invention will be made clear by the following detailed description of an embodiment thereof and the annexed drawings given for purely non-limiting explanatory purposes and wherein:

FIG. 1 shows a diagrammatic representation of a known macro-cell;

FIG. 2 shows a typical assembling of a few interconnected boards/cards including macro-cells of FIG. 1 and relative microprocessor/local bus interfaces;

FIGS. 3 to 8 show several general known architectures of interfaces between a microprocessor or a local bus and clusters of FIG. 1's macro-cells;

FIGS. 9 to 11 show several general architectures of interfaces between a microprocessor/local bus and clusters of user developed macro-cells according to the object of the present invention;

FIGS. 12 and 13 show the detailed structure of the COMMON-BUS visible in FIGS. 9 to 11;

FIG. 14 shows a diagrammatic representation of a block DMI MAIN (DMI ROOT) belonging to the microprocessor/local bus interfaces of FIGS. 9 to 11 of the invention;

FIG. 15 shows a most general diagrammatic representation of a block DMI PERIPHERAL (DMI LEAF) connected to a MACRO-CELL, both included in a CLUSTER OF MACRO-CELLS belonging to the microprocessor/local bus interfaces of FIGS. 9 to 11 of the invention;

FIG. 16 shows a layered representation of the block DMI PERIPHERAL (DMI LEAF) of FIG. 15;

FIG. 17 shows in detail a block PERIPHERAL COMMON BUS CONTROLLER only indicated in FIG. 16.

FIG. 18 shows a register block REGBLOCK of FIG. 17 and the relative interconnection to a user macro-cell;

FIG. 19 is the FIG. 18 plus a block REGBLOCK SYNCHRONIZER interposed between the block REGBLOCK and the user MACRO-CELL;

FIG. 20 details all the signals at the left side of each register embedded in block REGBLOCK either of FIG. 18 or 19;

FIG. 21 details all the signals at the two sides of each block Register Synchronizer embedded in block REGBLOCK SYNCHRONIZER of FIG. 19;

FIG. 22 shows a memory block MEMBLOCK of FIG. 16 and the relative interconnection to a user MACRO-CELL;

FIG. 23 shows a FIFO block FIFOBLOCK of FIG. 16 and the relative interconnection to a user MACRO-CELL;

FIG. 24 shows a detailed description of paired blocks COMMAND REGISTER and COMMAND REGISTER SYNCHRONIZER of FIG. 19;

FIG. 25 shows a detailed description of paired blocks STATUS REGISTER and STATUS REGISTER SYNCHRONIZER of FIG. 19;

FIG. 26 shows a detailed description of synchronization block P2P of FIGS. 24 and 25;

FIG. 27 shows some temporal wave-shapes concerning the operation of synchronization block P2P of FIG. 26;

FIG. 28 shows a time diagram of the synchronization operation carried out by means of synchronization block P2P of FIG. 26;

FIG. 29 includes FIG. 13 plus some blocks MACRO-CELL of FIG. 15, at one side, and other new blocks at the opposite side to interface an external microprocessor or local bus master;

FIGS. 30 and 31 shows an example, typical of microprocessor bus, of asynchronous two phases handshake in both read and write operations;

FIG. 32 shows a reference embodiment for the connection between the DMI MAIN (DMI ROOT) of present invention and a generic EXTERNAL BUS AGENT (either a microprocessor or a local bus controller) implementing a reference SYNCHRONOUS EXTERNAL BUS AGENT PROTOCOL. In the represented case the EXTERNAL BUS AGENT is supposed to act as master while the DMI MAIN is supposed to act as slave;

FIG. 33 shows a reference embodiment for the connection between the DMI MAIN (DMI ROOT) of present invention and a generic EXTERNAL BUS AGENT (either a microprocessor or a local bus controller) implementing a reference SYNCHRONOUS EXTERNAL BUS AGENT PROTOCOL. In the represented case the EXTERNAL BUS AGENT is supposed to act as slave while the DMI MAIN is supposed to act as master;

FIG. 34 is still the layer shown in FIG. 18 where both a DMA controller named TRANSACTION REQUESTER hosted into the MACRO-CELL and a set of signals to connect it to the DMI PRERIPHERAL are evidenced. This logic enables the MACRO-CELL to request transactions to the DMI MAIN and represents the master mode of the DMI;

FIG. 35 indicates as many representations of transactions among overall algorithms building up the protocol governing the distributed microprocessor interface of FIG. 29 operating in slave mode. In Figure is represented the case of DMI slave, EBA master executing a read transaction transferring data from a resource belonging to a DMI PERIPHERAL to the EBA (read from DMI PERIPHERAL to DMI MAIN plus read from DMI MAIN to EBA);

FIG. 36 indicates a message sequence chart related to the transactions of the preceding FIG. 35 in the case where the transaction is terminated by the EBA;

FIG. 37 indicates a message sequence chart related to the transactions of the preceding FIG. 35 in the case where the transaction is terminated by the DMI;

FIG. 38 indicates as many representations of transactions among overall algorithms building up the protocol governing the distributed microprocessor interface of FIG. 29 operating in slave mode. In Figure is represented the case of DMI slave, EBA master executing a write transaction transferring data from the EBA to a resource belonging to a DMI PERIPHERAL (write from EBA to DMI MAIN plus write from DMI MAIN to DMI PERIPHERAL);

FIG. 39 indicates a message sequence chart related to the transactions of the preceding FIG. 38 in the case where the transaction is terminated by the EBA;

FIG. 40 indicates a message sequence chart related to the transactions of the preceding FIG. 38 in the case where the transaction is terminated by the DMI;

FIG. 41 indicates as many representations of transactions among overall algorithms building up the protocol governing the distributed microprocessor interface of FIG. 29 operating in of master mode. In Figure is represented the case of DMI master, EBA slave executing a write transaction transferring data from a resource belonging to a DMI PERIPHERAL to the EBA (read from DMI PERIPHERAL to DMI MAIN plus write from DMI MAIN to EBA);

FIG. 42 shows a diagrammatic representation of a block PREFETCHABLE FIFO belonging to the DMI PERIPHERAL of FIG. 23 of the invention;

FIGS. 43 to 50 illustrate both the working principles of the PREFETCHABLE FIFO and of the protocol interfacing said PREFETCHABLE FIFO with the RX BUFFER in order to allow anticipative read of not prefetchable resources. A transaction is illustrated a set of snapshots of the system composed from the PREFETCHABLE FIFO and the RX BUFFER;

FIGS. 51 to 55 are referred to the software architecture of drivers associated to the macro-cells belonging to the microprocessor/local bus interfaces of FIGS. 9 to 11 of the invention;

Appendix A indicates five tables, namely Table A1, Table A2, Table A3 Table A4, and Table A5. Table A1 is used from a block MAX BURST LENGTH TABLE shown in FIGS. 16 and 18. Table A2 is used to characterize signal of FIGS. 20, 21, 24 and 25. Table A3 is used to explain read and write transactions indicated by arrows in FIG. 29. Table A4 describes both the meaning and the mimic of signals used from the two phase handshake algorithm illustrated in FIGS. 30 and 31. Table A5 describes both meaning and the mimic of the reference EXTERNAL BUS AGENT PROTOCOL introduced by FIGS. 32 and 33;

Appendix B indicates a main protocol for governing transactions on COMMON-BUS of FIG. 29, being the protocol further subject of the invention;

Appendix C indicates some Tables, numbered from C1 to C10, including relevant command, peripheral monitoring codes, and meaningful signals managed by the main protocol of Appendix B;

Appendix D which embeds Tables numbered from D1 to D8 indicate as many algorithms building up the main protocol of Appendix B taking into consideration the Tables of Appendix C;

Appendix E which embeds Tables numbered from E1 to E9 describes both the meaning and mimic of signals of PREFETCHABLE FIFO described in FIG. 42 and the algorithms executed by its internal blocks;

Appendix F which embeds Table F1 shows the address offset for each resource embedded into the REGBLOCK layer of FIG. 18;

Appendix G indicates a variant of an algorithm named CMDRF whose basic version is defined in Appendix D.

DETAILED DESCRIPTION OF AN EMBODIMENT OF THE INVENTION

With reference to FIG. 9 an embodiment of the invention is shown. The embodiment is related to the implementation of a multi-channel segmentation and re-assembly controller, suitable for broadband telecom applications, such us Broad-band ISDN (Integrated Service Digital Network) based on ATM (Asynchronous Transfer Mode) technology. The multi-channel segmentation and re-assembly controller is implemented as ASIC device indicated as ASIC in Figure. The ASIC is hosted on a board indicated as BOARD in Figure. The board BOARD also host a microprocessor uP, a microprocessor to local bus bridge named uP/LB, an on board memory MB1 and an on board MEMORY MB2. The microprocessor uP is connected to the bridge uP/LB via a bus named uPB. The bridge uP/LB and the ASIC are connected by means of a local bus indicated as LB. The ASIC is organized is four cluster of user developed macro-cells plus one block named DMI MAIN. In each of the four clusters an instance of a special block named DMI PERIPHERAL is present: in the first cluster (top-left side) DMI PERIPHERAL1, in the second cluster (top-right side) DMI PERIPHERAL2, in the third cluster (bottom-left side) DMI PERIPHERALs and in fourth cluster (bottom-right side) DMI PERIPHERAL4. All the DMI PERIPHERALs are connected to a bus indicated as COMMON-BUS which departs from a side of the block DMI MAIN; the other side of the block DMI MAIN being reached from the local bus LB.

In a first cluster (top-left side) three macro-cells are embedded, other than DMI PERIPHERAL1, precisely: ATMU TX, UTOPIA TX and ALIGNER TX. The macro-cell UTOPIA TX embeds in its turn a LOCAL MEMORY LM1 and the related MEMORY CONTROLLER sketched as a small rectangle closed to LM1. The macro-cell UTOPIA TX is connected to a functional primary output of the ASIC indicated with a small triangle and named DATA_TX. The macro-cell ALIGNER TX is connected to macro-cells UTOPIA TX and ATMU TX. The last macro-cell is connected to a macro-cell AAL TX PROTOCOL CORE belonging to a second cluster. Macro-cell DMI PERIPHERAL1 has two links towards ATMU TX and one towards ALIGNER TX macro-cells.

In a second cluster (top-right side) three macro-cells are embedded other than DMI PERIPHERAL2, precisely: SHAPER, ML TX PROTOCOL CORE and AAL TX MEMORY MANAGER. The macro-cell SHAPER is connected to the macro-cell AAL TX MEMORY MANAGER, which is connected to the macro-cell AAL TX PROTOCOL CORE and to the on board memory MB2. The macro-cell AAL TX PROTOCOL CORE is further connected to the macro-cell ATMU TX located in the first cluster. The macro-cell DMI PERIPHERAL2 is in turn connected with all the three macro-cells of the cluster.

In a third cluster (bottom-left side) two macro-cells are embedded, named AAL RX PROTOCOL CORE and AAL RX MEMORY MANAGER, other than the macro-cell DMI PERIPHERAL3. The macro-cell AAL RX PROTOCOL CORE is connected to the suitable for broadband telecom applications, such us Broad-band ISDN (Integrated Service Digital Network) based on ATM (Asynchronous Transfer Mode) technology. The multi-channel segmentation and re-assembly controller is implemented as ASIC device indicated as ASIC in Figure. The ASIC is hosted on a board indicated as BOARD in Figure. The board BOARD also host a microprocessor uP, a microprocessor to local bus bridge named uP/LB, an on board memory MB1 and an on board MEMORY MB2. The microprocessor uP is connected to the bridge uP/LB via a bus named uPB. The bridge uP/LB and the ASIC are connected by means of a local bus indicated as LB. The ASIC is organized is four cluster of user developed macro-cells plus one block named DMI MAIN. In each of the four clusters an instance of a special block named DMI PERIPHERAL is present: in the first cluster (top-left side) DMI PERIPHERAL1, in the second cluster (top-right side) DMI PERIPHERAL2, in the third cluster (bottom-left side) DMI PERIPHERAL3 and in fourth cluster (bottom-right side) DMI PERIPHERAL4. All the DMI PERIPHERALs are connected to a bus indicated as COMMON-BUS which departs from a side of the block DMI MAIN; the other side of the block DMI MAIN being reached from the local bus LB.

In a first cluster (top-left side) three macro-cells are embedded, other than DMI PERIPHERAL1, precisely: ATMU TX, UTOPIA TX and ALIGNER TX. The macro-cell UTOPIA TX embeds in its turn a LOCAL MEMORY LM1 and the related MEMORY CONTROLLER sketched as a small rectangle closed to LM1. The macro-cell UTOPIA TX is connected to a functional primary output of the ASIC indicated with a small triangle and named DATA_TX. The macro-cell ALIGNER TX is connected to macro-cells UTOPIA TX and ATMU TX. The last macro-cell is connected to a macro-cell AAL TX PROTOCOL CORE belonging to a second cluster. Macro-cell DMI PERIPHERAL1 has two links towards ATMU TX and one towards ALIGNER TX macro-cells.

In a second cluster (top-right side) three macro-cells are embedded other than DMI PERIPHERAL2, precisely: SHAPER, AAL TX PROTOCOL CORE and AAL TX MEMORY MANAGER. The macro-cell SHAPER is connected to the macro-cell AAL TX MEMORY MANAGER, which is connected to the macro-cell AAL TX PROTOCOL CORE and to the on board memory MB2. The macro-cell AAL TX PROTOCOL CORE is further connected to the macro-cell ATMU TX located in the first cluster. The macro-cell DMI PERIPHERAL2 is in turn connected with all the three macro-cells of the cluster.

In a third cluster (bottom-left side) two macro-cells are embedded, named AAL RX PROTOCOL CORE and AAL RX MEMORY MANAGER, other than the macro-cell DMI PERIPHERAL3. The macro-cell AAL RX PROTOCOL CORE is connected to the macro-cells AAL RX MEMORY MANAGER and to a macro-cell ATMU RX belonging to a fourth cluster. The macro-cell, AAL RX MEMORY MANAGER is also connected to the on board memory MB1. The macro-cell DMI PERIPHERAL3 is also connected to all the three macro-cells of the cluster.

In a fourth cluster (bottom-right side) three macro-cells are embedded, other than DMI PERIPHERAL4, precisely: ATMU RX, UTOPIA RX and ALIGNER RX. The macro-cell UTOPIA RX embeds in its turn a LOCAL MEMORY LM2 and the related MEMORY CONTROLLER sketched as a small rectangle closed to LM2. The block UTOPIA RX is connected to a functional primary input of the ASIC indicated with a small triangle and named DATA_RX. The macro-cell ALIGNER RX is connected to the UTOPIA RX and ATMU RX macro-cells. The last block is also connected to the macro-cell AAL RX PROTOCOL CORE embedded in the third cluster. Macro-cell DMI PERIPHERAL4 has two links towards macro-cell ATMU RX and one link towards macro-cell ALIGNER RX.

In operation, the main functions implemented by the BOARD of FIG. 9 are those of ATM Adaptation Layer 5 (AAL5) referred to the standardized protocol OSI (Open System Interconnection), and of various ATM protocol layers. These functions are aggregated in two main sets: TX and RX. The operation is based on a scheduler (block SHAPER) to control the flow of transmitted data, and on a standardized interface (blocks UTOPIA). The transmit part of the board receives packets from the local bus LB, stores and segments them in ATM cells, and sends the cells to the Physical layer embedded by the ASIC. The DATA_TX acts as a primary output for the board. The receiver side of the board gets ATM cells from the primary input RX, authenticates, stores and reassembles them in packets, and sends the packets to the proper destination through the local bus LB.

More precisely, at the TX side the incoming packets are stored by the AAL TX MEMORY MANAGER macro-cell, that provides proper segmentation. The AAL5 protocol fields are calculated by the AAL TX PROTOCOL CORE macro-cell. The AAL segments are sent according to the transmission credits (tokens) provided by the SHAPER macro-cell. The ATMU TX macro-cell processes the AAL segments and prepares the ATM cells, that are re-timed by the ALIGNER TX macro-cell. The signals DATA_TX are then sent to the physical layer by means of the UTOPIA TX macro-cell.

At the RX side the incoming signals DATA_RX are received by the UTOPIA RX macro-cell, re-timed by the ALIGNER RX and processed by the ATMU RX macro-cells, that removes the ATM header and provides proprietary labels to AAL segments. The segments are then processed by the AAL RX PROTOCOL CORE macro-cell and are stored by the AAL RX MEMORY MANAGER macro-cell. The reassembled segments form packets that are sent to the destination.

Each macro-cells needs specific signals in order to be properly configured and controlled, e.g. signals carrying the ATM header of a new channel to be activated. The distributed interface of the present invention made up by DMI MAIN and DMI PERIPHERALs macro-cells, simplifies noticeably the job of providing user macro-cells with these signals. That because DMI MAIN plus its set of DMI PERIPHERALs constitute a substantially application independent interface implementing a plurality of general purpose sub-set of resources available for all gender of user macro-cells, so that the last need not particular arrangements anymore for being connected to this interface. On the contrary to a microprocessor interface of the prior art which need a consistent design embedded in the user macro-cells, because the architecture of the interface is not based on a consistent standardizable design outside the user macro-cells able to operate the interface in a way substantially application independent.

With reference to FIG. 10 a second embodiment of the invention is shown. It differs from the embodiment of FIG. 9 mainly for the absence of the block uP/LB BRIDGE at a left side of the BOARD and for the presence of an additional block DMI MAIN2
at the opposite side of the board. In FIG. 10 the microprocessor uP is directly connected to the block DMI MAIN1 inside the ASIC by means of the microprocessor bus uPB. Block DMI MAIN2 has a first side connected to a local bus LB on the back-plane, and the opposite side connected to a bus COMMON-BUS 2, while the block DMI MAIN1 is interposed between uPB and COMMON-BUS 1 buses, the last bus is the COMMON-BUS of the previous FIG. 9. First and third clusters of macro-cells are unchanged, while the remaining two clusters each includes an additional DMI PERIPHERAL macro-cell, both connected to the COMMON-BUS 2. More precisely, the second cluster includes a DMI PERIPHERAL6 macro-cell connected to AAL TX MEMORY MANAGER and AAL TX PROTOCOL CORE macro-cells instead of the DMI PERIPHERAL2 macro-cell. The fourth cluster includes a DMI PERIPHERAL5 macro-cell connected to the ALIGNER TX macro-cell instead of the DMI PERIPHERAL4 macro-cell.

With reference to FIG. 11 a third embodiment of the invention is shown which differs from the embodiment of FIG. 9 only for the particular implementation in which the four clusters of macro-cells and the block DMI MAIN constituting the ASIC of FIG. 9 are implemented as many FPGA. The presence of the COMMON-BUS highly reduces the number of interconnections between DMI PERIPHERALs and relative user developed macro-cells; this makes the new implementation suitable for prototyping the ASIC, being the architecture the same for both ASIC and FPGA bread-boarding, no changes are necessary changing the implementation technology.

With reference to FIG. 12 the COMMON-BUS of the previous FIGS. 9 to 11 is detailed into functionally structured sub buses. This means that some buses could not be physically present in the implementation but their function must be implemented (e.g. some bus can be multiplexed in time sharing fashion on the same physical support). Each sub bus has either direction upstream or downstream: the upstream direction is from DMI MAIN to DMI PERIPHERALs; the downstream direction is from DMI PERIPHERALs to DMI MAIN. Moreover the upstream buses are of type one2many (one-to-many: only one driver. The term driver in this context is the physical logic gate who drives the bus with voltage levels), that is with only one transmitter (DMI MAIN) and many receivers (DMI PERIPHERALs). On the contrary, downstream buses are of type many2one (many-to-one: many drivers), that is with many transmitters (DMI PERIPHERALs) and only one receiver (DMI MAIN).

Because of the particular tree like configuration of the distributed interface of the invention depicted in FIGS. 9 to 11, in which the block DMI MAIN is similar to the root and the blocks DMI PERIPHERALs to leafs, the two type of blocks are hereinafter also named DMI_ROOT and DMI_LEAFs indifferently. In FIG. 12 is also depicted an EXTERNAL-BUS at the left side of block DMI MAIN, this bi-directional bus is either the microprocessor uPB bus of FIG. 10 or local bus LB of FIG. 11.

Distributed Microprocessor Interface (DMI) can be master or slave with respect to one EXTERNAL BUS AGENT (EBA) acting on EXTERNAL-BUS. Slave means that only the EXTERNAL BUS AGENT can initiate a transaction and block DMI MAIN can only execute the transaction issued by the EXTERNAL BUS AGENT, issuing proper commands towards the interfaced resources embedded into blocks DMI PERIPHERALs. On the contrary, master means that also block DMI MAIN can initiate a transaction with the EXTERNAL BUS AGENT. Block DMI MAIN initiates a transaction with the EXTERNAL BUS AGENT only when a resource embedded into a block DMI PERIPHERAL requests block DMI MAIN to do it. The EXTERNAL BUS AGENT is supposed to be of master type. In the case of distributed microprocessor interface (DMI) of master type the transaction requester, other than the EXTERNAL BUS AGENT, can be a block DMI PERIPHERAL (more