Home
Patent Search
IMT Blog
REGISTER
|
SIGN IN
United States Patent
5872904
McMillen , ; et al.
February 16, 1999
Title
Computer system using a master processor to automatically reconfigure faulty switch node that is detected and reported by diagnostic processor without causing communications interruption
Abstract
A multistage interconnect network (MIN) capable of supporting massive parallel processing, including point-to-point and multicast communications between processor modules (PMs) which are connected to the input and output ports of the network. The network is built using interconnected switch nodes arranged in 2 .left brkt-top. log.sub.b N .right brkt-top. stages, wherein b is the number of switch node input/output ports, N is the number of network input/output ports and .left brkt-top. log.sub.b N .right brkt-top. indicates a ceiling function providing the smallest integer not less than log.sub.b N. The additional stages provide additional paths between network input ports and network output ports, thereby enhancing fault tolerance and lessening contention.
Inventors:
McMillen; Robert J.
(Encinitas,
CA
)
, Watson; M. Cameron
(Los Angeles,
CA
)
, Chura; David J.
(Redondo Beach,
CA
)
Assignee:
NCR Corporation
(Dayton,
OH
)
Appl. No.:
08/656,007
Filed:
May 24, 1996
Current U.S. Class:
714/4
370/217
709/239
714/3
714/31
Current International Class:
G06F 11/00 (20060101)
Field of Search:
370/60,406,231,400,392,471,217 395/800,200.15,312,182.02,182.01,183.07,200.09
U.S. Patent Documents
3290446
December 1966
Ceonzo
3317676
May 1967
Ekbergh et al.
3491211
January 1970
Bininda et al.
3581286
May 1971
Beausolell
3582560
June 1971
Banks
3693155
September 1972
Crafton et al.
3963872
June 1976
Hagstrom et al.
4022982
May 1977
Hemdal
4038638
July 1977
Hwang
4074072
February 1978
Christensen et al.
4075693
February 1978
Fox et al.
4081612
March 1978
Hafner
4146749
March 1979
Pepping et al.
4173713
November 1979
Giesken
4177514
December 1979
Rupp
4201889
May 1980
Lawrence et al.
4201891
May 1980
Lawrence et al.
4237447
December 1980
Clark
4247892
January 1981
Lawrence
4251879
February 1981
Clark
4307446
December 1981
Barton et al.
4317193
February 1982
Joel, Jr.
4344134
August 1982
Barnes
4347498
August 1982
Lee et al.
4412285
October 1983
Neches et al.
4417244
November 1983
Melas
4417245
November 1983
Melas
4445171
April 1984
Neches
4456987
June 1984
Wirsing
4466060
August 1984
Riddle
4481623
November 1984
Clark
4484262
November 1984
Sullivan
4486877
December 1984
Turner
4491945
January 1985
Turner
4494185
January 1985
Gunderson et al.
4518960
May 1985
Clark
4523273
June 1985
Adams, III et al.
4540000
September 1985
Bencher
4543630
September 1985
Neches
4550397
October 1985
Turner
4561090
December 1985
Turner
4577308
March 1986
Larson
4621359
November 1986
McMillen
4622632
November 1986
Tanimoto et al.
4623996
November 1986
McMillen
4630258
December 1986
McMillen
4630260
December 1986
Toy et al.
4633394
December 1986
Georgiou et al.
4638475
January 1987
Koike
4651318
March 1987
Luderer
4656622
April 1987
Lea et al.
4661947
April 1987
Lea et al.
4663620
May 1987
Paul et al.
4670871
June 1987
Vaidya
4679186
July 1987
Lea
4695999
September 1987
Lebizay
4701906
October 1987
Ransom et al.
4706150
November 1987
Lebizay et al.
4707781
November 1987
Sullivan
4731825
March 1988
Wojcinski et al.
4731878
March 1988
Vaidya
4734907
March 1988
Turner
4740954
April 1988
Cotton
4742511
May 1988
Johnson
4745593
May 1988
Stewart
4761780
August 1988
Bingham
4766534
August 1988
DeBenedictis
4780873
October 1988
Mattheyses
4782478
November 1988
Day, Jr. et al.
4785446
November 1988
Dias et al.
4809362
February 1989
Claus et al.
4811210
March 1989
McAulay
4814973
March 1989
Hillis
4814979
March 1989
Neches
4814980
March 1989
Peterson
4817084
March 1989
Arthurs et al.
4829227
May 1989
Turner
4833468
May 1989
Larson et al.
4833671
May 1989
Becker et al.
4845722
July 1989
Kent et al.
4845736
July 1989
Posner et al.
4845744
July 1989
DeBenedictis
4847755
July 1989
Morrison et al.
4849751
July 1989
Barber et al.
4860201
August 1989
Stolfo et al.
4864558
September 1989
Imagawa et al.
4866701
September 1989
Giacopelli et al.
4925311
May 1990
Neches et al.
4945471
July 1990
Neches
4956772
September 1990
Neches
4962497
October 1990
Ferenc et al.
5006978
April 1991
Neches
5022025
June 1991
Urushidani et al.
5088091
February 1992
Schroeder et al.
5119369
June 1992
Tanabe et al.
5121384
June 1992
Ozeki et al.
5199027
March 1993
Barri
5214642
May 1993
Kunimoto et al.
5522046
May 1996
McMillen et al.
Other References
R J. McMillen, A Study of Multistage Interconnection Networks: Design, Distributed Control, Fault Tolerance, and Performance, PhD Thesis, Purdue University, Dec. 1982. .
Dr. Philip M. Neches, "THE YNET: An Interconnect Structure for a Highly Concurrent Data Base Computer System", Teradata Corporation, 1988. .
R. D. Rettberg, W.R. Crowther, P.P. Carvey and R.S. Tonalanson, "The Monacrch Parallel Processor Hardware Design", Computer, Apr. 1990, pp. 18-30. .
L. R. Goke amd G.J. Lipovski, "Banyan Networks for Partitioning Multiprocessor Systems", Processing of the First Annual Symposium on Computer Architecture, 1973, pp. 21-28. .
T. Feng, "A Survey of Interconnection Networks", Computer, Dec, 1981, pp. 12-27. .
D.P. Agrawal, "Testing and Fault Tolerance of Multistage Interconnection Networks", Computer, Apr. 1982, pp. 41-53. .
Burroughs Corporation, "Final Report: Numerical Aerodynamic Simulation Facility; Feasibility Study", Mar. 1979. .
G. F. Pfister, W.C. Brantley, D.A. George, S.L. Harvey, W.J. Kleinfelder, K.P. McAuliffe, E.A. Melton, V.A. Norton, and J. Weiss, "The IBM Research Parallel Processor Prototype (RP3): Introduction and Architecture", Proceedings of the 1985 International Conference on Parallel Processing, 1985, pp. 764-771. .
G.F. Pfister and V.A. Norton, "Hot Spot Contention and Combining in Multistage Interconnection Networks", Processings of the 1985 International Conference on Parallel Processing, 1985, pp. 790-797. .
W.C. Brantley, K.P. McAuliffe, and J. Weiss, "RP3 Processor-Memory Element", Proceedings of the 1985 International Conference on Parallel Processing, pp. 782-789. .
W. Crowther, J. Goodhue, E. Starr, R. Thomas, W. Milliken and T. Blackadar, "Performance Measurements on a 128-node Butterfly.TM. Parallel Processor" Proceedings of the 1985 International Conference on Parallel Processing, 1985, pp. 531-540. .
A. Gottlieb, R. Grishman, C.P. Kruskal, K.P. McAuliffe, L. Rudolph and M. Snir, "The NYU Ultra Computer-Designing an MIMD Shared Memory parallel Computer" IEEE Transactions on Computers, vol. C-32, No. 2, Feb. 1983, pp. 175-189. .
Leiserson, "Transactions on Computers," IEEE, vol. C-34, No. 10, Oct. 1985..~
Primary Examiner:
Luu; Le Hien
Attorney, Agent or Firm:
Gates & Cooper
Parent Case Text
This is a continuation of application Ser. No. 08/253,868, filed Jun. 3, 1994, Pat. No. 5,522,046, which is a continuation of application Ser. No. 07/694,110, U.S. Pat. No. 5,321,813, filed May 1, 1991, and issued Jun. 14, 1994.
Claims
What is claimed is:
1. A method of communications in a computer system, comprising the steps of:
(a) transmitting messages between a plurality of processors in the computer system using a plurality of separate and distinct networks, wherein each of the networks is comprised of a plurality of active logic switch nodes;
(b) detecting and reporting any errors that occur within each of the networks during the transmitting step using one or more diagnostic processors coupled to the switch nodes; and
(c) automatically reconfiguring one or more switch nodes in one of the networks using a master processor coupled to the diagnostic processors when an error is detected in the network, while using only the other one of the networks for transmitting messages during the reconfiguration step to avoid interrupting communications in the system.
2. The method of claim 1, wherein each of the switch nodes further comprises a tag mapping table associated with each input port of a switch node, and the method further comprises the step of interpreting a routing tag to determine which output port of the switch node to select in order to route a connect request correctly through the reconfigured network.
3. The method of claim 2, wherein the tag mapping table comprises a memory array with a plurality of entries, and the method further comprises the step of translating the routing tag to an output port selection, wherein the memory array provides a one-to-one mapping between a logical output port selection provided by the routing tag and a physical output port selection.
4. The method of claim 3, further comprising the step of deriving each entry in the memory array from an appropriate field of the routing tag according to a stage of the switch node within the reconfigured network.
5. The method of claim 2, wherein the method further comprises the step of mapping the routing tag to a physical output port selection based on the way in which the switch nodes in the reconfigured network are interconnected.
6. The method of claim 2, further comprising the step of initializing each entry in the tag mapping table with a default value, so that a physical output port selection is equal to a logical output port selection, thereby providing default values that are appropriate for fully configured networks.
7. The method of claim 6, further comprising the step of overlaying the default values when the master processor coupled to the reconfigured network has determined an actual topology for the switch nodes in the reconfigured network.
8. The method of claim 1, further comprising the step of indicating which input ports of the switch node are operational via input enable vectors.
9. The method of claim 1, further comprising the step of indicating which output ports of the switch node are operational via output enable vectors.
10. A computer system, comprising:
(a) a plurality of processors transmitting messages therebetween using a plurality of separate and distinct networks, wherein each of the networks is comprised of a plurality of active logic switch nodes;
(b) one or more diagnostic processors, coupled to the switch nodes, for detecting and reporting any errors that occur within each of the networks during the transmitting of messages; and
(c) a master processor, coupled to the diagnostic processors, for automatically reconfiguring one or more switch nodes in one of the networks via the diagnostic processors when an error is detected in the network, while using only the other one of the networks for transmitting messages during the reconfiguration to avoid interrupting communications in the system.
11. The computer system of claim 10, wherein each of the switch nodes further comprises a tag mapping table associated with each input port of a switch node, and the switch nodes further comprising means for interpreting a routing tag to determine which output port of the switch node to select in order to route a connect request correctly through the reconfigured network.
12. The computer system of claim 11, wherein the tag mapping table comprises a memory array with a plurality of entries, and the switch nodes further comprising means for translating the routing tag to an output port selection, wherein the memory array provides a one-to-one mapping between a logical output port selection provided by the routing tag and a physical output port selection.
13. The computer system of claim 12, wherein the switch nodes further comprise means for deriving each entry in the memory array from an appropriate field of the routing tag according to a stage of the switch node within the reconfigured network.
14. The computer system of claim 11, wherein the switch nodes further comprises means for mapping the routing tag to a physical output port selection based on the way in which the switch nodes in the reconfigured network are interconnected.
15. The computer system of claim 11, wherein the switch nodes further comprise means for initializing each entry in the tag mapping table with a default value, so that a physical output port selection is equal to a logical output port selection, thereby providing default values that are appropriate for fully configured networks.
16. The computer system of claim 15, wherein the diagnostic processors further comprise means for overlaying the default values when the master processor has determined an actual topology for the switch nodes in the reconfigured network.
17. The computer system of claim 10, wherein the switch nodes further comprise input enable vectors for indicating which input ports of the switch node are operational.
18. The computer system of claim 10, wherein the switch nodes further comprise output enable vectors for indicating which output ports of the switch node are operational.
Description
TABLE OF CONTENTS
BACKGROUND OF THE INVENTION
SUMMARY OF THE INVENTION
BRIEF DESCRIPTION OF THE DRAWINGS
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
1. GENERAL DESCRIPTION
2. NETWORK TOPOLOGY
3. SWITCH NODES
4. NETWORK CONTROLLERS
5. DIAGNOSTIC PROCESSORS
6. PACKAGING
7. TYPE-A BOARD
8. TYPE-B BOARD
9. TYPE-C BOARD
10. COMMUNICATION MODULE ASSEMBLY
11. SIMPLIFIED CABLING
12. SWITCH NODE ADDRESSING
13. AUTOMATIC PROCESSOR ID ASSIGNMENT
14. DYNAMIC CONFIGURATION
15. SIMPLIFIED BACKPLANE ROUTING
16. CONNECTION PROTOCOL
17. DISCONNECTION PROTOCOL
18. MONOCAST LOAD BALANCING
19. MONOCAST NON-BLOCKING
20. MONOCAST BLOCKING
21. MONOCAST BLOCKING WITHOUT LOAD BALANCING
22. MONOCAST PIPELINE
23. MONOCAST NON-PIPELINE
24. CONTROLLER SOFTWARE
25. SUPERCLUSTERS
26. MULTICAST
27. FORWARD CHANNEL COMMANDS
28. BACK CHANNEL REPLIES
29. NETWORK APPLICATIONS
30. CONCLUSION
TABLE I
TABLE II
TABLE III
TABLE IV
TABLE V
CLAIMS
ABSTRACT
BACKGROUND OF THE INVENTION
1. FIELD OF THE INVENTION
This invention relates in general to computer networks, and in particular to a scalable multi-stage interconnect network 14 for multiprocessor computers.
2. DESCRIPTION OF RELATED ART
Parallel processing is considered an advantageous approach for increasing processing speeds in computer systems. Parallel processing can provide powerful communications and computer systems which can handle complex problems and manipulate large databases quickly and reliably.
One example of parallel processing can be found in U.S. Pat. No. 4,412,285, issued Oct. 25, 1983, to Neches et al., incorporated by reference herein. This patent describes a system using a sorting network to intercouple multiple processors so as to distribute priority messages to all processors.
Further examples of parallel processing can be found in U.S. Pat. No. 4,445,171, issued Apr. 24, 1984, to Neches, U.S. Pat. No. 4,543,630, issued Sep. 24, 1985, to Neches, and U.S. Pat. No. 4,814,979, issued Mar. 21, 1989, to Neches, all of which are incorporated by reference herein. These patents describe a multiprocessor system which intercouples processors with an active logic network having a plurality of priority determining nodes. Messages are applied concurrently to the network in groups from the processors and are sorted, using the data content of the messages to determine priority, to select a single or common priority message which is distributed to all processors with a predetermined total network delay time.
Communication within parallel processing systems such as those described above is typically classified as either tightly coupled wherein communication occurs through a common memory or loosely coupled wherein communication occurs via switching logic and communications paths. Various topologies and protocols for loosely coupled processors have been proposed and used in the prior art. These topologies tend to be grouped into two categories: static and dynamic.
Static topologies provide communication paths between processors which cannot be reconfigured. Examples of static topologies include linear arrays, rings, stars, trees, hypercubes, etc.
Dynamic topologies permit dynamic reconfiguration of communication paths between processors using switching elements within the network. Examples of dynamic topologies include single stage networks and multistage interconnect networks (MINs).
A single stage network has one stage switching elements such that information can be re-circulated until it reaches the desired output port. A MIN has a plurality of switching element stages capable of connecting any input port of the network to any output port.
In general, MINs consist of several stages of switching elements or switch nodes that are wired together according to a regular pattern. Typically, each switch node is a small crossbar switch that usually has an equal number of inputs and outputs, e.g., a bxb switch node. Prior art MINs typically consist of log.sub.b N stages, wherein b is the number of input/output ports of a switch node, and N is the number of input/output ports of a network. Typically, such MINs are therefore constructed from the smallest number of links and switch nodes that allows any network input port to be connected to any network output port.
Prior attempts at implementing MINs suffer from several disadvantages. One disadvantage arises because each network input/output port pair typically has only one way to be connected, thereby making the MIN susceptible to internal contention. Internal contention occurs when two paths require the same link even though the paths may or may not be to the same network output port.
Another disadvantage is lessened reliability due to the number and complexity of components. If a fault occurs, it is often difficult to determine where the problem lies. Further, it may be impossible to reconfigure the system to exclude the failed component or service the system without shutting it down, thereby leaving the system inoperable until the problem is corrected.
Another disadvantage is complex, expensive, and time-consuming manufacturing and installation requirements. For large network configurations, cabling may be unmanageable due to the logistics of making sure every component is correctly cabled and plugged into the correct connector.
Still another disadvantage involves diminishing bandwidth. The bandwidth available to each processor tends to decrease as the system size grows.
SUMMARY OF THE INVENTION
To overcome the limitations in the prior art described above, and to overcome other limitations that will become apparent upon reading and understanding the present specification, the present invention discloses a multistage interconnect network (MIN) capable of supporting massive parallel processing, including point-to-point and multicast communications between processor modules (PMs) which are connected to the input and output ports of the network. The network is built using interconnected bxb switch nodes arranged in .left brkt-top. log.sub.b N .right brkt-top.+1 (or more) stages, wherein b is the number of input/output ports of a switch node, N is the number of input/output ports of a network, and .left brkt-top. log.sub.b N .right brkt-top. indicates a ceiling function providing the smallest integer not less than log.sub.b N. The additional stages provide additional paths between network input ports and network output ports, thereby enhancing fault tolerance and lessening contention.
The present invention provides numerous advantages. One advantage is reliability. The system is designed to keep working even when components fail by automatically reconfiguring itself when a fault is detected.
Still another advantage is serviceability. The error reporting method isolates faults to prevent them from propagating throughout the network.
Still another advantage is manufacturability. For large system configurations, cabling could be very unmanageable. However, the design of the present invention, along with flexible cable connection rules, make the problem tractable for large systems and nonexistent for small systems.
Still another advantage is simple installation. Any processor can be plugged into any available receptacle. This eliminates a source of errors by dropping the need to make sure every cable is plugged into the correct connector. All other systems we know of have this cabling constraint.
Still another advantage is high performance per processor. The high connectivity topology, extra stages of switch nodes, back-off capability, pipelining operation, back channel, and multicast window features combine to provide a high speed connection capability for each processor regardless of the number of processors in the system. In other systems, the bandwidth available to each processor tends to decrease as the system size grows.
BRIEF DESCRIPTION OF THE DRAWINGS
Referring now to the drawings in which like reference numbers represent corresponding parts throughout:
FIG. 1 illustrates the components of the present invention, which comprises a general purpose multiprocessor computer system capable of massive parallel processing;
FIG. 2 provides one example illustrating how the switch nodes are interconnected to implement a network;
FIG. 3 illustrates the permutation function between stage 0 and stage 1 for a network having between 9 and 64 network I/O ports;
FIG. 4 describes the components of an 8x8 switch node according to the present invention;
FIG. 5 is a block diagram describing the components of the controllers that connect each PM to the networks;
FIG. 6 describes a Type-A board used in the network;
FIG. 7 describes a Type-B board used in the network;
FIG. 8 describes a Type-C board used in the network;
FIG. 9 illustrates a network comprising a single Communication Module Assembly (CMA/A), which supports between 2 and 64 network I/O ports;
FIG. 10 describes circuit switching within the CMA/A wherein a Universal Wiring Pattern (UWP) between stage 0 and stage 1 switch nodes is embedded in a backplane;
FIG. 11 illustrates a network 14 having CMA/As and CMA/Bs, which support between 65 and 512 network I/O ports;
FIG. 12 illustrates a network 14 having CMA/As and CMA/Cs, which support between 65 and 4096 network I/O ports;
FIGS. 13 (a) and (b) illustrate a cable harness assembly;
FIG. 14 illustrates a practical implementation of the cable harness assembly shown in FIGS. 13 (a) and (b);
FIG. 15 shows a simplified wiring diagram describing how the switch nodes are connected in a network having 128 network I/O ports;
FIGS. 16 (a), (b), (c) and (d) provide simplified wiring diagrams describing the expansion from 64 PMs 12 to 65-128 PMs;
FIG. 17 shows the cabling for the situation in which here are 512 network I/O ports in the network;
FIG. 18 shows the cabling for the situation in which here are more than 512 network I/O ports in the network;
FIG. 19 shows the cabling for the situation in which there are 1024 network I/O ports in the network;
FIG. 20 shows the largest possible configuration of 4096 network I/O ports using eight cabinets to house the network;
FIG. 21 is a flow chart describing the steps required for configuring the network;
FIG. 22 is a flow chart describing the steps required for reconfiguring the network when a fault occurs;
FIG. 23 illustrates the paths traversed through the network by a monocast connect command;
FIG. 24 illustrates the software tasks executed by the network controllers;
FIG. 25 illustrates the paths traversed through the network by a multicast connect command;
FIG. 26 illustrates one possible application of the present invention, which comprises a general purpose multiprocessor computer system capable of massive parallel processing.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
In the following description of the preferred embodiment, reference is made to the accompanying drawings which form a part hereof, and in which is shown by way of illustration a specific embodiment in which the invention may be practiced. It is to be understood that other embodiments may be utilized and structural changes may be made without departing from the scope of the present invention.
1. GENERAL DESCRIPTION
FIG. 1 illustrates the components of the present invention, which comprises a general purpose multiprocessor computer system 10 capable of massively parallel processing. The components illustrated in FIG. 1 include processor modules (PMs) 12, networks 14, switch nodes 16, controllers 18, network I/O ports 20, optical transceivers 22, optical fibers 24, Transparent Asynchronous Transceiver Interface (TAXI) transceivers 26, redundant master clocks 28, bounce-back points 30, forward channels 32, and back channels 34.
The PMs 12 are common platform processor modules which communicate with each other by means of redundant networks 14. However, it is envisioned that the network 14 of the present invention could be used for communications purposes in a large number of different applications. Thus, those skilled in the art will recognize that any number of agents of various types, e.g., memory devices, peripheral devices, etc., could be substituted for the PMs 12 shown.
The system 10 may use redundant networks 14 (labeled network A and network B in FIG. 1) for enhanced fault tolerance and increased bandwidth. If one of the networks 14 is not available, then another network 14 can take over, to allow for graceful degradation of the system 10 in the presence of malfunctions. Software executed by the PMs 12 handles the added complexity of redundant networks 14 and automatically load levels between operative networks 14. The software also supports fault detection and switching in the event of a failure of one of the networks 14.
Each network 14 is a multistage interconnect network 14 (MIN) that employs active logic switch nodes 16. In the preferred embodiment, the switch nodes 16 have eight input ports which can be connected to any of eight output ports to effectuate the switching functions. (In the following description, the term "switch node 16 I/O port" is often used to refer to a pair of corresponding, i.e., similarly numbered, input and output ports of a switch node 16.) A plurality of switch nodes 16 are interconnected together in a plurality of stages to provide the paths between the network input ports and the network output ports. (In the following description, the term "network I/O port 20" is often used to refer to a pair of corresponding, i.e., similarly numbered, input and output ports of a network 14. Typically, a network I/O port will interface to one PM 12, although this is not required to practice the present invention;)
In the preferred embodiment, there are more than .left brkt-top. log.sub.b N .right brkt-top. stages in the network 14, wherein b is the number of I/O ports of a switch node 16, N is the number of network I/O ports 20, and .left brkt-top. log.sub.b N .right brkt-top. indicates a ceiling function providing the smallest integer not less than log.sub.b N. (Typically, a switch node 16 will have the same number of input ports and output ports, although this is not required to practice the present invention. If the number of input ports and output ports is not identical, then the above equation would become log.sub.(a,b) N, wherein a is the number of switch node 16 input ports and b is the number of switch node 16 output ports.) The additional stages provide additional communication paths between any network input port and network output port, thereby enhancing fault tolerance and lessening contention.
Each network 14 is logically full-duplex. The bandwidth of the network 14 is not limited by the bandwidth of any particular switch node. In fact, the bandwidth of the network 14 increases as the number of network I/O ports 20 increases due to the increased number of paths between switch nodes 16. Functionally, the network 14 provides a plurality of possible interconnection paths for a circuit, from a sending PM 12 to a set (one or more) of receiving PMs 12.
Each network 14 automatically detects and reports any errors that occurred during operation, even if there is no traffic. The network 14 is able to detect and isolate errors automatically without propagating them, which improves serviceability. The network 14 can be automatically reconfigured when a fault is detected, without interrupting the operation of the system 10, and minimizing performance degradation after reconfiguration.
Communications between the PMs 12 are conducted in two basic modes: point-to-point and multicast. In point-to-point communications, a PM 12 transmits a connect command to another PM 12. The connect command travels through a forward channel 32
in the network 14 to the receiving PM 12. The receiving PM 12 returns a reply to the sending PM 12 through a back channel 34. Once the connection is made to the receiving PM 12, the sending PM 12 transmits its messages, and then terminates the connection when the transmission is done. The network 14 will support many of such point-to-point communications, between different pairs of PMs 12 at the same time. In the absence of conflicts, all PMs 12 could communicate at the same time.
In the second, or multicast, mode of communications, a single PM 12 can broadcast a message to all of the other PMs 12 or a predefined group of PMs 12. The predefined groups of PMs 12 are called "superclusters" and multicast commands within different superclusters can occur simultaneously. The sending PM 12 transmits its multicast command which propagates through the forward channel 32 to all of the PMs 12 or the group of PMs 12. The PMs 12 that receive multicast messages reply to them by transmitting, for example, their current status through the back channel 34. The network 14 can function to combine the replies in various ways.
Each PM 12 has at least one separate controller 18 for interfacing to each network 14. There is no limit on the number of controllers 18 that connect a PM 12 to a network 14 if additional bandwidth is desired. Transparent Asynchronous Transceiver Interface (TAXI) transceivers 26 are used to serialize and de-serialize data for transmission between the controllers 18 and the network 14 over optical fiber 24. The TAXI transceivers 26 convert parallel data into a high speed serial form that encodes clock information into the data stream, and vice versa. The controller 18 outputs a forward channel 32 consisting of eight bits of data plus a single bit parity, and a one bit back channel 34 associated with the receive channel to the TAXI transceiver 26. The controller 18 receives a forward channel 32 consisting of eight bits of data plus a single bit of parity and a one bit back channel 34 associated with the transmit channel from the TAXI transceiver 26. The TAXI transceiver 26
converts the 10 bits of parallel data into bit serial data. The TAXI transceiver 26 converts the bit serial data back into 10 bits of parallel data and recovers the clock. The back channels 34 are only one bit so they can interface to the TAXI transceivers 26 with the forward channels 32, thus providing more efficient packaging.
2. NETWORK TOPOLOGY
FIG. 2 provides one example illustrating how the switch nodes 16 are interconnected to implement a network 14. In the preferred embodiment, the 8x8 switch nodes 16 are arranged in 2 .left brkt-top. log.sub.8 N .right brkt-top. stages, wherein N is the number of network I/O ports 20 and .left brkt-top. log.sub.8 N .right brkt-top. indicates a ceiling function providing the smallest integer not less than log.sub.8 N. Thus, for a network 14 having 8 or less network I/O ports 20, there are 2
log.sub.8 8=2 stages; for a network 14 having between 9 and 64 network I/O ports 20, there are 2 log.sub.8 64=4 stages; for a network 14 having between 65 and 512 network I/O ports 20, there are 2 log.sub.8 512=6 stages; and for a network 14 having between 513 and 4096 network I/O ports 20, there are 2 log.sub.8 4096=8 stages. The additional stages provide additional communication paths between any network input port and network output port, thereby enhancing fault tolerance and lessening contention.
As indicated in FIG. 2, the stage numbers increment from left to right beginning at 0, until a "bounce-back point" 30 is reached, at which point the stage numbers decrement from left to right back to 0. The bounce-back point 30 indicates the point where the stages of the network 14 are physically folded. Folding the network 14 allows corresponding switch nodes 16 in similarly numbered stages on either side of the bounce-back point 30 to be located adjacent to each other to simplify packaging and to minimize signal path lengths (especially to/from the PMs 12). The folded network 14 is illustrated by FIG. 1, and FIGS. 6, 7, and 8 described further hereinafter in conjunction with Type-A, -B, and -C boards.
Each 8x8 switch node 16 used in the preferred embodiment has eight input ports and eight output ports, wherein each port interfaces to a 9-bit (8-bits of data and 1 bit of parity) forward channel 32 and a 1-bit back channel 34. (For the sake of brevity and clarity, however, FIG. 2 represents each forward channel 32 and back channel 34 pair with a single line, wherein the direction of the forward channel 32 is indicated by an arrow and the direction of the back channel 34 is opposite the arrow).
Within any 8x8 switch node 16, any input port can be connected to any output port by the function of the logic within the switch node 16. Up to eight PMs 12 may be applied to the eight input ports of each switch node 16 in the "left" stage 0
switch nodes 16 on the left side of the bounce-back point 30 in FIG. 2; these are the network input ports. Each of the output ports from the "left" stage 0 switch nodes 16 communicate bidirectionally with a different one of the "left" stage 1 switch nodes 16 on the left side of the bounce-back point 30 in FIG. 2, so that any one of the "left" stage 0 switch nodes 16 can communicate with any one of the "left" stage 1 switch nodes 16. (For the sake of brevity and clarity, however, FIG. 2 shows only a portion of the interconnections between switch nodes 16). Each of the output ports from the "left" stage 1 switch nodes 16 communicate bidirectionally with a corresponding "right" stage 1 switch node 16 on the right side of the bounce-back 30 point in FIG. 2. Each of the output ports from the "right" stage 1 switch nodes 16 communicate bidirectionally with a different one of the "right" stage 0 switch nodes 16 on the right side of the bounce-back point 30 in FIG. 2, so that any one of the "right" stage 1 switch nodes 16 can communicate with any one of the "right" stage 0 switch nodes 16; these are the network output ports. Thus, any PM 12 connected to a "left" stage 0 switch node 16 can communicate with any PM 12 connected to a "right" stage 0
switch node 16 by appropriate switching of the stage 0 and stage 1 switch nodes 16.
The pattern of interconnections between the stage 0 and stage 1 switch nodes 16 in FIG. 2 is termed a Universal Wiring Pattern (UWP). This pattern is "universal" because the interconnections between different stages in any size network 14
consists of one or more copies of the UWP. (Note that the pattern of interconnections between similarly numbered stages, i.e., across the bounce-back point 30, is not a UWP, but instead consists of a "straight" interconnection wherein the output ports of a switch node 16 communicate bidirectionally only with the input ports of a corresponding switch node.)
For a network 14 of size N=8.sup.n, n>1, wherein n indicates the number of stages in the network and N indicates the number of network I/O ports 20 and thus the number of PMs 12 that can be attached thereto, the number of copies of the UWP between each stage is 8.sup.n-2.
For 8 or less network I/O ports 20 (n=1), there is only one stage and thus no UWP.
For 9 to 64 network I/O ports 20 (n=2), there is one (8.sup.2-2) copy of the UWP between each pair of stages.
For 65 to 512 network I/O ports 20 (n=3), there are eight (8.sup.3-2) copies of the UWP between each pair of stages. In the preferred embodiment, the patterns do not overlap between Stages 0 and 1; the patterns are stretched out and overlap between Stages 1 and 2.
For 513 to 4096 network I/O ports 20 (n=4), there are 64 (8.sup.4-2) copies of the UWP between each pair of stages. In the preferred embodiment, the patterns do not overlap between Stages 0 and 1; the patterns are stretched out and overlap between Stages 1 and 2; the patterns do not overlap between Stages 2 and 3.
The UWP is a function of the switch node 16 size and is generated by a permutation function that identifies which ports to connect between switch nodes 16 in different stages. Mathematical properties of these interconnections simplify cabling in the network 14.
Because 8x8 switch nodes 16 are used, the number of network I/O ports 20 is N=8.sup.n, n .di-elect cons. {1, 2, 3, . . . }, and there are n Stages numbered from 0 to n-1. The switch nodes 16 in each Stage are numbered from top to bottom from 0
to N/8-1. The input/output ports of the switch nodes 16 in each Stage can be numbered from top to bottom from 0 to N-1, which are the ports' Levels. The ports on each side of a given switch node 16 are numbered from 0 to 7 from top to bottom.
There are two ways to reference a specific input/output port on a specific switch node 16. The first method is by (Stage:Level) and the second is by the triplet (Stage:Switch-Node-Number : Switch-Node-Port-Number). For example, in a network 14
of N=512 network I/O ports 20 (n=3), let S be the Stage number and X be the Level number, herein X is an arbitrary number, 0.ltoreq.X<N, represented using octal digits as: x.sub.n-1 . . . x.sub.1 x.sub.0, where 0.ltoreq.x.sub.i <8 and
0.ltoreq.i<n. Therefore, (S: x.sub.2 x.sub.1 x.sub.0) is the reference by the first method and (S: x.sub.2 x.sub.1 : x.sub.0) is the reference by the second method.
It can be shown that the pattern of connections between each Stage is completely specified by permuting the digits of the Level number. In the general case, for all X, 0.ltoreq.x <N, the total set of switch node 16 output ports numbered (S: x.sub.n-1. . . x.sub.1 x.sub.0) are connected to the switch node 16 input ports (S+1: PERMUTE.sup.n.sub.s {x.sub.n-1 . . . x.sub.1 x.sub.0 }). The permutation function is subscripted with an "S" to indicate that the function is associated with a specific Stage, and typically, is different in each Stage. The "n" superscript refers to the number of Stages in the network 14.
For a network 14 of 8 or less network I/O ports 20 (n=1) there is no permutation function, because only two Stage 0 switch nodes 16 are used.
For a network 14 of between 9 and 64 network I/O ports (n=2) there is only one possible permutation function between Stage 0 and Stage 1: PERMUTE.sup.2.sub.0 {x.sub.1 x.sub.0 }=x.sub.0 x.sub.1. To see how this works, examine FIG. 3. The Level numbers are shown at the ports on the extreme left and right sides of FIG. 3. Consider the second output from switch node 16 #3 in Stage 0, i.e., (0:3:1). it is at Level 25.sub.10 which is 31.sub.8. To calculate which input it should be connected to in Stage 1, reverse the octal digits to obtain 13.sub.8 which is Level 11.sub.10. This process can be repeated for each Level from 0 to 63 to obtain a table enumerating the connections.
For a network 14 of between 65 and 512 network I/O ports (n=3), two permutation functions are needed: PERMUTE.sup.3.sub.0 {x.sub.2 x.sub.1 x.sub.0 }=x.sub.2 x.sub.0 x.sub.1 and PERMUTE.sup.3 .sub.1 {x.sub.2 x.sub.1 x.sub.0 }=x.sub.1 x.sub.0
x.sub.2. To see the effect of this sequence of permutation functions, examine its effect on the octal number 210.sub.8. This number is chosen to illustrate where the digits are mapped at each Stage in the network 14. 210 is mapped by PERMUTE.sup.3.sub.0 to 201 and that is then mapped by PERMUTE.sup.3 .sub.1 to 012. The permutation function is chosen so that each digit number (e.g., 0, 1, and 2) appears in the least significant position once. Clearly, these permutation functions meet the condition (notice the underlined digit). This condition guarantees that every network I/O port 20 will have a path to every other network I/O port 20. Another PERMUTE.sup.3.sub.1 function that could be used with the given PERMUTE.sup.3.sub.0
function is PERMUTE.sup.3 .sub.1 {x.sub.2 x.sub.1 x.sub.0 }=x.sub.0 x.sub.1 x.sub.2. This would produce the mappings 210 to 201 to 102 which meets the constraint. If either PERMUTE.sup.3.sub.1 function were exchanged with the PERMUTE.sup.3 .sub.0
function, the respective inverse networks 14 would be obtained.
The topology specified by PERMUTE.sup.3.sub.0 and PERMUTE.sup.3.sub.1 should be thought of as the virtual network 14 topology. Due to the mapping capabilities of the switch nodes 16, discussed further hereinafter, the physical cabling will not necessarily match this topology. The network 14, however, behaves as though it does have this topology.
In the preferred embodiment, it is also necessary to consider the topology of a network 14 of 4096 network I/O ports 20 (n=4). This requires three permutation functions: PERMUTE.sup.4.sub.0 {x.sub.3 x.sub.2 x.sub.1 x.sub.0 }=x.sub.3 x.sub.2
x.sub.0 x.sub.1, PERMUTE.sup.4.sub.1 {x.sub.3 x.sub.2 x.sub.1 x.sub.0 }=x.sub.1 x.sub.0 x.sub.3 x.sub.2, and PERMUTE.sup.4.sub.2 {x.sub.3 x.sub.2 x.sub.1 x.sub.0 }=x.sub.3 x.sub.2 x.sub.0 x.sub.1. This sequence of permutation functions maps octal
3210.sub.8 to 3201.sub.8 to 0132.sub.8 to 0123.sub.8. Again, notice that each digit appears in the least significant position once. The reason this set of functions is chosen is because PERMUTE.sup.4.sub.0 and PERMUTE.sup.4.sub.2 leave the most significant two digits undisturbed. The physical consequence of this is to minimize the cable length in those two Stages. In the worst case, the distance between an output from one Stage to the input of the next Stage can be no greater than 64 Levels. For example, examination of FIG. 3 shows the worst case length to be from Level 7 to Level 56. Note that a network 14 of 4096 network I/O ports 20 would contain 64 copies of FIG. 3 in Stages 0 and 1 and another 64 copies would make up Stages 2 and 3. PERMUTE.sup.4 .sub.1 would specify the interconnection between the two sets of 64 subnetworks.
3. SWITCH NODES
FIG. 4 describes the components of an 8x8 switch node 16 according to the present invention. FIG. 4 shows the basic circuitry required for communications from left to right through 9-bit forward channels 32, and for receiving and transmitting, from right to left, serial replies through 1-bit back channels 34. To implement a "folded" network 14, a duplicate but reversed 8x8 switch node 16 having the elements shown in FIG. 4 is required for communications from right to left through 9-bit forward channels 32, and for receiving and transmitting, from left to right serial replies, through 1-bit back channels 34.
The organization of the switch node 16 is modular; there are eight identical copies of the input port logic (IPL) 36 and eight identical copies of the output port logic (OPL) 38. Each switch node 16 is a crossbar so that each input port can be connected to any of the output ports. Each input port receives a forward channel 32 comprising eight bits of parallel data and one bit of parity; each input port transmits a back channel 34 comprising one bit of serialized data. Each output port receives a back channel 34 comprising one bit of serialized data; each output port transmits a forward channel 32 comprising eight bits of parallel data and one bit of parity.
Each IPL 36 is comprised of the following logic components, which are described further hereinafter: hard carrier timer 44, input FIFO 46, command/data latch 48, tag latch 50, command decode 52, parity check 54, input state control 56, output port select 58, data select mux 60, feedback select 62, command generator 64, input status register 66, back channel mux 68, reply generator 70, port level register 72, back channel output mux 74. Each OPL 38 is comprised of the following logic components, which are described further hereinafter: hard carrier logic 84, hard carrier timer 86, output status register 92, parity check 94, output state control 96, 8-input arbiter 98, path select 100, output mux 102, output latch 104, command generator 106, reply decode 110, receive FIFO 112, back channel FIFO 114, clock select 116. In addition, the switch node 16 comprises the following logic components, which are described further hereinafter: hard carrier timer generator 88, hard carrier timeout value register 90, all out busy monitor 118, merge logic 120, diagnostic port logic (DPL) 122, back channel interface 124, diagnostic port interface (DPI) 126, read/write control register 128, multicast port select register 130, tag mapping table
108, and chip address register 121.
Within the IPL 36, the input state control 56 constantly monitors the input on the forward channel 32 for the periodic presence of hard carriers, which indicates that the input port is connected to another switch node 16 or a TAXI transceiver 26. If the forward channel 32 input is directly interfaced to the TAXI transceiver 26, the presence of a hard carrier is indicated by a strobe of a CSTRBI signal 42 generated by a TAXI transceiver 26. If the forward channel 32 input is directly interfaced to another switch node 16, the presence of a hard carrier is indicated by the reception of a hard carrier escape code. Upon receipt of a hard carrier, a hard carrier timer 44 in the IPL 36 loads in two times the count value from a hard carrier timeout value register 90. The hard carrier timer 44 then counts down and another hard carrier must be received prior to the counter reaching zero; otherwise a hard carrier lost flag is set in the input status register 66. If the input port is not directly interfaced with a TAXI transceiver 26, the hard carrier timer 44 for the back channel 34 is disabled.
Within the OPL 38, the output state control 96 constantly monitors the input from the back channel 34 for the periodic presence of a hard carrier whenever it is directly interfaced to a TAXI transceiver 26. The presence of the carrier is indicated by a strobe of a CSTRBI signal 42 generated by the TAXI transceiver. Upon receipt of a hard carrier, a hard carrier timer 86 in the OPL 38 loads in two times the count value from a hard carrier timeout value register 90. The hard carrier timer 86 then counts down and another hard carrier must be received prior to the counter reaching zero; otherwise a hard carrier lost flag is set in the output status register 92. If the output port is not directly interfaced with a TAXI transceiver 26, the hard carrier timer 86 for the back channel 34 is disabled.
The OPL 38 also maintains the presence of a hard carrier on a forward channel 32 output. If there is no circuit active, the OPL 38 generates a hard carrier every time it receives a signal from the hard carrier timer generator 88, and upon reaching zero, the hard carrier timer generator 88 is reloaded from the hard carrier timeout value register 90. If a circuit is established, the OPL 38 generates a hard carrier whenever the IPL 36 to which it is connected receives a hard carrier. If the forward channel 32 output is directly interfaced to another switch node 16, the hard carrier that is generated takes the form a hard carrier escape code. If the forward channel 32 output is directly interfaced to a TAXI transceiver 26, the hard carrier is generated by the TAXI transceiver 26 as a result of not receiving anything from the switch node 16 OPL 38 forward channel 34 for one cycle.
When no circuit is established or pending, the switch nodes 16 and sending controllers 18 always generate a continuous stream of soft carrier commands. The controllers 18 and switch nodes 16 always expect to receive the soft carrier when there is no circuit established or pending. If the sort carrier or another legal command is not received immediately, a soft carrier loss error is reported by setting the appropriate bit of an input status register 66.
When a circuit is connected, pending connect, or pending disconnect, switch nodes 16 and controllers 18 always expect to receive an idle command when nothing else is expected. If an idle command or another legal command is not received, the forward channel loss bit or an idle loss error bit is set in the input status register 66.
4. NETWORK CONTROLLERS
FIG. 5 is a block diagram describing the components of the controllers 18 that connect each PM 12 to the networks 14. A controller 18 comprises of a SPARC.TM. microprocessor 56 controlling the transfer of data through an input/output processor (IOP) 58. The IOP 58 communicates directly with a system bus 136 connected to the PM 12 and with the network 14 via phase locked TAXI transmitters 148 and receivers 150, and an optical transceiver 22. The TAXI transmitters 148 and TAXI receivers 150
are used to serialize and de-serialize data for transmission over optical fiber 24.
The controller 18 outputs a forward channel 32 consisting of eight bits of data plus a single bit parity, and a one bit back channel 34 associated with the receive channel to the TAXI transmitter 148. The controller 18 receives a forward channel
32 consisting of eight bits of data plus a single bit of parity and a one bit back channel 34 associated with the transmit channel from the TAXI receiver 150. The TAXI transmitter 148 converts the 10 bits of parallel data into bit serial data that encodes clock information into the data stream. The TAXI receiver 150 converts the bit serial data back into 10 bits of parallel data and recovers the clock. Each TAXI transmitter 148 on the controller 18 derives its clock input from the clock output of the TAXI receiver 150 via the phase locked loop 146. This allows each controller 18 to maintain synchronization to a master clock 28 distributed via the network 14.
5. DIAGNOSTIC PROCESSORS
As shown in FIG. 5, every controller 18 (and boards in FIGS. 6, 7, and 8) is interfaced to a diagnostic processor (DP) 140. There is one DP 140 per physical board that is interfaced to all the components on that board. All the DPs 140 are interconnected using a local area network (LAN) 144. During system startup, the DPs 140 have the ability to run self tests on the components and perform any initialization that is needed. During normal operation, the DPs 140 can respond to error conditions and facilitate logging them. Those DPs 140 that are interfaced to switch nodes 16 also participate in the process of reconfiguring the network 14 when errors are detected. A switch node 16 may detect numerous faults including parity errors, hard carrier loss, data over runs, back channel 34 loss, forward channel 32 loss, soft carrier loss, null loss, idle loss, FIFO errors, violation errors, tag errors, command/reply errors, time outs, and merge errors.
Referring again to FIG. 4, the diagnostic port interface (DPI) 126 in the diagnostic port logic (DPL) 122 of each switch node 16 allows the DP 140 to perform two types of activities within the switch node 16, i.e., reading and writing selected registers and sending information out any back channel 34 output. When the command decode 52 and the IPL 36 detects the presence of a DP 140 command or datum, it stores the command in the command/data and tag latches 48 and 50, and signals the DP 140
via the DPI 126. Using the DPI 126 and read/write register 128, the DP 140 picks up the command. The DP 140 commands are always acknowledged with a reply from the DP 140 which is returned via the back channel 34 output.
A forced parity error register is provided in each IPL 36 and each OPL 38. It is used for forcing parity errors on a forward channel 32 in the OPL 38 or back channel 34 in the IPL 36. The DP 140 may read or write the register. If a given forced parity error register is set to 00 when a test command or test reply is received, and a circuit exists, then the command or reply is forwarded to the next switch node 16, but otherwise ignored. If the register is set to 01 when a test command is received, and a circuit exists, then the test command is forwarded to the next switch node 16 and the byte which immediately follows has its parity bit inverted before being forwarded to the next switch node 16 (however, the forwarding switch node 16
does not report an error). If the register is set to 01 when a test reply is received and a circuit exists, then the test reply is "backwarded" to the previous switch node 16 with its first parity bit inverted (however, the "backwarding" switch node 16
does not report an error). In either case, the register is then cleared to zero. If the register is set to 10, then the behaviors are the same as the 01 case; except that the parity is inverted continuously as long as the register is set to 10, and the register is not automatically cleared to 00.
6. PACKAGING
In the preferred embodiment, each network 14 is constructed using up to four different boards, i.e., Type-A, -B, -C, and -D boards. Type-A and -D boards are used if the network 14 contains between 2 and 64 network I/O ports 20; Type-A, -B, and -D boards are used if the network 14 contains between 65 and 512 network I/O ports 20; and Type-A, -C, and -D boards are used if the network 14 contains between 513 and 4096 network I/O ports 20.
7. TYPE-A BOARD
FIG. 6 describes a Type-A board 170. As described hereinbefore, the network 14 is physically folded and the switch nodes 16 are paired so that a "left" switch node 16 in a specific stage and level is physically adjacent to a "right" switch node
16 in the same stage and level. Each Type-A board 170 contains one such stage 0 switch node 16 pair and one such stage 1 switch node 16 pair. Consequently, eight properly connected Type-A boards 170 form a network 14 having 64 network I/O ports 20.
Up to eight PMs 12 may connect via controllers 18 to optical transceivers 22 on each Type-A board 170. The optical transceivers 22 communication, via TAXI transceivers 148 and 150, with the eight input ports of a first 8x8 switch node 16 in stage 0. Each of the output ports from the first stage 0 switch node 16 communicates with the input ports of a first stage 1 switch node 16. Up to eight Type-A boards 170 cross-connect between the first stage 0 switch nodes 16 and the first stage 1
switch nodes 16, in a manner described in FIG. 3, via a backplane (not shown). The first stage 1 switch node 16 connects to TAXI transceivers 148 and 150 which either loop back (at the bounce-back point 30) to connect to adjacent TAXI transceivers 148
and 150 in a network 14 with 64 or fewer network I/O ports 20, or connect to a Type-B board 172 (discussed below) in a network 14 having between 65 and 512 network I/O ports 20, or connect to a Type-C board 174 (discussed below) in a network 14 having between 513 and 4096 network I/O ports 20. The TAXI transceivers 148 and 150 connect to the input ports of a second stage 1 switch node 16. The output ports of the second stage 1 switch node 16 connect to the input ports of a second stage 0 switch node
16. Up to eight Type-A boards 170 cross-connect between the second stage 1 switch nodes 16 and the second stage 0 switch nodes 16, in a manner described in FIG. 3, via the backplane. The output ports of the second stage 0 switch node 16 connect to the optical transceivers 22, via TAXI transceivers 148 and 150, and thus to the eight PMs 12.
Note that when interfacing to a TAXI transceiver 148 and 150, output port from the switch node 16 handling left to right paths is paired with input port i from the switch node 16 handling right to left paths, and vice versa. (For the sake of brevity and clarity, however, FIG. 6 shows only the back channel connections, as dotted lines, from the TAXI transmitter 148 at the bottom of FIG. 6 to the seventh input port on the #1 switch node 16 and from the seventh output port on the #2 switch node
16 to the TAXI receiver 150 on the bottom of FIG. 6.) Thus, any one of the PMs 12 can connect to another of the PMs 12 by appropriate switching of the stage 0 and stage 1 switch nodes 16.
8. TYPE-B BOARD
FIG. 7 describes a Type-B board 172. Each Type-B board 172 contains two switch node 16 pairs. The switch node 16 pairs are in stage 2 of any network 14 with more than 64 network I/O ports 20. These switch nodes 16 are on either side of the bounce-back point 30 and thus represent the point at which data "bounces back", "turns around", or reverses direction in the folded network 14. In networks 14 supporting between 65 and 512 network I/O ports 20, the stage 1 switch nodes 16 on the Type-A boards 170 are interconnected with the stage 2 switch node 16 on the Type-B boards 172 to effect an expansion of the network 14. Thus, any one of the PMs 12 can connect to another of the PMs 12 by appropriate switching of the stage 0, stage 1, and stage
2 switch nodes 16.
9. TYPE-C BOARD
FIG. 8 describes a Type-C board 174. For a system 10 supporting between 513 and 4096 network I/O ports 20, an additional stage of switch nodes 16 (stage 3) is required, with the switch nodes 16 in stage 3 communicating with the switch nodes 16
of stage 2. Both stage 2 and stage 3 switch nodes 16 are implemented on the Type-C board 174. The switch nodes 16 labeled as #1-#4 are in stage 2 of the network 14; switch nodes 16 labeled as #5-#8 are in stage 3 of the network 14.
The input ports of a first stage 2 switch node 16 connect to Type-D boards 176 via TAXI transceivers 148 and 150. Each of the output ports from the first stage 2 switch node 16 communicates with the input ports of a first stage 3 switch 15 node
16. Up to four Type-C boards 174 cross-connect between the first stage 2 switch nodes 16 and the first stage 3 switch nodes 16, in a manner described in FIG. 3, via a backplane (not shown). The first stage 3 switch node 16 loop back (at the bounce-back point 30) to connect to the input ports of a second stage 3 switch node 16. The output ports of the second stage 3 switch node 16 connect to the input ports of a second stage 2 switch node 16. Up to four Type-C boards 174 cross-connect between the second stage 3 switch nodes 16 and the second stage 2 switch nodes 16, in a manner described in FIG. 3, via the backplane. The output ports of the second stage 2 switch node-16 connect to Type-D boards 176 via TAXI transceivers 148 and 150. Note that when interfacing to a TAXI transceiver 148 and 150, output port i from the switch node 16 handling left to right paths is paired with input port i from the switch node 16 handling right to left paths, and vice versa. (For the sake of brevity and clarity, however, FIG. 8 shows only the back channel connections, as dotted lines, from the TAXI transmitter 148 at the bottom of FIG. 8 to the seventh input port on the #3 switch node 16 and from the seventh output port on the #4 switch node 16 to the TAXI receiver 150 on the bottom of FIG. 8.)
10. COMMUNICATION MODULE ASSEMBLY
Each cabinet housing the components of the network 14 contains up to six Communication Module Assemblies (CMAs). The packaging of components within the CMAs is intended to minimize configuration errors and simplify manufacturing and field upgrading. There are three types of CMAs, i.e., CMA/A, CMA/B, and CMA/C, depending on the size of the network 14: the CMA/A type is used in networks 14 supporting between 2 and 64 network I/O ports 20; the CMA/A and CMA/B types are used in networks 14
supporting between 65 and 512 network I/O ports 20; and the CMA/A and CMA/C types are used in networks 14 supporting between 513 and 4096 network I/O ports 20.
FIG. 9 illustrates a network 14 comprising a single CMA/A 182, which supports between 2 and 64 network I/O ports 20. The CMA/A 182 contains a power board, up to 8 Type-A boards 170, and 2 Type-D boards 176. The Type-A and Type-D boards 176 are arranged in two groups of five boards each. In each group, the first two slots hold Type-A boards 170, the next slot holds a Type-D board 176, and the remaining two slots hold Type-A boards 170. The UWP between stage 0 and stage 1 switch nodes 16 is embedded in a backplane 180.
The Type-D board 176 in the CMA/A 182 interconnects up to four Type-A boards 170 in a CMA/A 182 to up to four Type-B boards 172 in a CMA/B 184. The rationale behind the Type-D board 176 is that there is no room for electrical connectors on the front panels of Type-A boards 170 to carry the signals from the Type-A boards 170 in the CMA/A 182 to Type-B boards 172 in a CMA/B 184. Therefore, the Type-D board holds four connectors on its front and the board is used only as a repeater of high speed TAXI signals. There can be up to two Type-D boards in a CMA/A 182 to service eight Type-A boards 170 in the CMA/A 182.
FIG. 10 describes circuit switching within the CMA/A 182 and illustrates the Type-A board 170 connections to the backplane 180 and the PMs 12. In the preferred embodiment, all the stage 0 to stage 1 interconnections are between Type-A boards 170
residing in the same CMA/A 182, so the interconnection pattern, i.e., the UWP, between the stages is embedded in a backplane 180.
Within the Type-A boards 170, the bounce-back point 30 is created by connecting each of the eight TAXI transmitters 148 to the corresponding TAXI receivers 150 (see also, FIG. 6). Note that for a network 14 of this size, as an option, a non-expandable Type-A board 170 could be used with the following modifications to the board shown in FIG. 6: (1) the output TAXI transceivers 148 and 150 on the right side of FIG. 6 would be eliminated; and (2) the outputs from the switch node 16 labeled as #3 would be connected directly to the inputs to the switch node 16 labeled as #4. Doing this would substantially lower the power consumption (by approximately 1/3) and cost of the Type-A board 170. The main drawback is having an additional board type. However, this configuration could be expected to meet the needs of many systems.
FIG. 11 illustrates a network 14 having CMA/As 182 and CMA/Bs 184, which support between 65 and 512 network I/O ports 20. Each CMA/B 184 houses eleven slots containing a power board, two dummy slots, and two groups of four Type-B boards 172. For networks 14 supporting between 65 and 512 network I/O ports 20, each fully configured CMA/A 182 requires connection to one group in a CMA/B 184, i.e., every Type-B board 172 can connect to two Type-A boards 170. For networks 14 supporting 64 or fewer network I/O ports 20, no CMA/B 184 is required. In the preferred embodiment, the stage 1 to stage 2 interconnection pattern, i.e., the UWP, is embedded in a backplane 180 in the CMA/B 184. (Two backplanes 180 are shown in FIG. 11 because each group of four Type-B boards uses a different backplane.)
FIG. 12 illustrates a network 14 having CMA/As 182 and CMA/Cs 186, which support between 513 and 4096 network I/O ports 20. Each CMA/C 186 houses a power board, two dummy boards, and up to two groups comprised of four Type-C boards 174. For networks 14 supporting between 513 and 4096 network I/O ports 20, each fully configured CMA/A 182 requires connection to one group in a CMA/C 186, i.e., every Type-C board 174 can connect to two Type-A boards 170. In the preferred embodiment, all the stage 2 to stage 3 interconnections are between Type-C boards 174 residing in the same CMA/C 186, so the interconnection pattern, i.e., the UWP, between the stages is embedded in a backplane 180. (Two backplanes 180 are shown in FIG. 12 because each group of four Type-C boards uses a different backplane).
11. SIMPLIFIED CABLING
In the present invention, simplified cabling is intended to minimize configuration errors and simplify manufacturing and field upgrading. It is desirable to manufacture cables with a minimum number of different lengths. Without this capability, a given cable might not reach a specific connector in the specified CMA, although there are some connectors in that CMA it does reach. With this capability, it can be plugged into the connector that it does reach. In the field, connectors can be moved as needed for routing convenience. Thus, field engineers do not have to deal with as many configuration errors.
In the present invention, signal wires are grouped into multiconductor cables so that the number of cables that have to be handled is minimized. Cables within the network 14 can be plugged into almost any available connector in a chassis with minimal constraints. There are only two constraints on how to install cables: (1) two ends of the same cable cannot be plugged into the same board type; and (2) each cable end is constrained only as to which of several CMA/As 182 or CMA/Bs 184 (which group in the case of a CMA/B 184) it is connected. The cable may be plugged into any available connector in the correct CMA/A 182 or CMA/B 184, i.e., any of the four connectors on either Type-D board 176 in a CMA/A 182 or either connector on any of the four Type-B boards 172 in either group of a CMA/B 184. However, a connector on the Type-D board 176 is not considered available unless the slot to which it is wired contains a Type-A board 170. Unavailable connectors may be capped in manufacturing.
FIG. 13 (a) illustrates a cable harness assembly 178, wherein each cluster of eight cables labeled with a letter (A through R) plugs into one bidirectional switch node 16 pair. Connectors A through H connect to switch nodes 16 on Type-A boards
170 (through the Type-D board 176) and J through R connect to switch nodes 16 on Type-B boards 172. FIG. 13 (b) provides a simplified representation of the cable harness assembly 178 of FIG. 13 (a).
Due to limited space for cable routing within a cabinet and the complexity of the cable harness assembly 178, it is preferable to avoid manufacturing a cable harness assembly 178 which is physically constructed as shown. Hence, the cabling is implemented as follows.
For a network 14 with at least 65 but no more than 512 network I/O ports 20, one type of cable harness assembly 178 with variations in length is used. This cable harness assembly 178 is illustrated in FIG. 14 and is equivalent to the cable harness assembly 178 shown in FIGS. 13 (a) and (b). The cable harness assembly 178 comprises eight bundles, labeled A-H, wherein each bundle has eight pairs of coaxial cable. The cross connections are embedded in the backplane 180 to which the Type-B boards 172 are attached. The two connectors attached to the front panel of Type-B boards 172 are wired directly to the backplane 180 where they are distributed to the appropriate stage 2 switch nodes 16. The net result is as though the cable harness assembly 178 of FIGS. 13 (a) and (b) is used and each of its connectors, J through R, are directly connected to the TAXI transceivers 148 and 150 of a bidirectional switch node 16 pair on a Type-B board 172 instead of being routed through the backplane
180.
As additional network I/O ports 20 are added, only an approximately proportional amount of hardware is added, in most cases. Thus, the network 14 may be expanded in small increments while maintaining performance, in contrast to prior art networks 14 which require large increments of hardware to be added to maintain bandwidth when certain size boundaries are crossed, e.g., N=b.sup.i +1, wherein N is the number of network I/O ports 20, b is the number of switch node 16 I/O ports, and i=1,
2, etc.
The cabling of networks 14 with more than 64 network I/O ports 20 allows for graceful expansion as the number of network I/O ports 20 is increased. The number of additional boards is kept to a minimum. As additional network I/O ports 20 are added to a network 14, the need to add Type-A boards 170 is determined by such factors as: (1) the number of Stage 0 to Stage 1 paths available by virtue of the Type-A boards 170 already present; (2) the percentage of the maximum possible bandwidth desired; (3) the number of optical transceivers 22 needed to physically connect all PMs 12; and (4) the number of CMAs that must be cross-linked.
As a network 14 grows from N=1 to N=512, either no additional hardware is required when a processor is added (the majority of the cases, i.e., 448 out of 512), or there is a linear increase of up to one additional resource of each type (57 out of
512 cases), or there is a discontinuity with more than linear growth (7 out of 512 cases).
The seven discontinuities are shown in Table I. The increment from 64 ->65 is the worst case percentage-wise, because that mark -the transition from two stages to three stages. At all remaining discontinuities, the percentage increase is never greater than 12.5% (1/8th) beyond linear. There is no compounding effect due to the discontinuities in that, once a discontinuity is crossed, as N grows, no additional hardware is added at all until the linear growth relationship is restored, i.e., N "catches up" to the number of Type-A boards 170 or Type-B boards 172. This is illustrated in Table I where the ratios of numbers before the discontinuity is always perfectly linear, but not after. For example, in the "Type-A" column, X.sub.A .fwdarw.Y.sub.A is the change shown and, correspondingly, in the "N" column, X.sub.N .fwdarw.Y.sub.N. Therefore, X.sub.A /X.sub.N is always 1/8th, which is perfect because one Type-A board 170 can accommodate eight network I/O ports 20.
The minimum percentage of maximum possible bandwidth in a network 14 may be arbitrarily set to 50%. In order to maintain this bandwidth, the following formulae are used to calculate the number of CMA/As 182 (#CMA/A), CMA/Bs 184 (#CMA/B), Type-A boards 170 (#A), Type-B boards 172 (#B), and Type-D boards 176 (#D):
#CMA/A=.left brkt-top.N/64.right brkt-top.
#A=MAX(.left brkt-top.N/8.right brkt-top., (8* .left brkt-bot.(N-1)/64.right brkt-bot.+MAX(.left brkt-top.(N MOD 64)/8.right brkt-top., .left brkt-top.SQRT((N MOD 64)/2).right brkt-top., .left brkt-top.N/128.right brkt-top.* (N>64))))
#B=(N>64) * 4 * .left brkt-top.N/128.right brkt-top.
#D=(#CMA/A+MIN(#CMA/A-1, (2 * #B-MIN(4, #A-8 * (#CMA/A-1))) MOD(4 * (#CMA/A-1)))) p #cmA/B=.left brkt-top.#B/8.right brkt-top.
wherein MAX is a maximum function, MIN is a minimum function, .left brkt-top..right brkt-top. is a ceiling function, .left brkt-bot..right brkt-bot. is a floor function, MOD is an integer remainder, SQRT is a square root, and > is a boolean "greater than" function.
To configure a system 10 for N PMs 12 such that 100% of the maximum possible bandwidth is available, the following formulae are used to determine the number of CMA/As 182 (#CMA/A), CMA/Bs 184 (#CMA/B), Type-A boards 170 (#A), Type-B boards 172
(#B), and Type-D boards (176) (#D) that are required:
#CMA/A=.left brkt-top.N/64.right brkt-top.
#A=MAX(.left brkt-top.N/8.right brkt-top., (8 * .left brkt-bot.(N-1)/64.right brkt-bot.+MAX (.left brkt-top.SQRT(N MOD 64).right brkt-top., .left brkt-top.N/64.right brkt-top. * (N>64))))
#B=(N>64) * MAX(.left brkt-top.#A/2.right brkt-top., 4 * .left brkt-bot.(N-1)/64.right brkt-bot.+.left brkt-top.N/128.right brkt-top.)
#D=.left brkt-top.#A/4.right brkt-top.
#CMA/B=.left brkt-top.#B/8.right brkt-top.
Table II shows an example of the number of Type-A boards 170 needed versus the number of PMs 12 for a network 14 with up to 64 network I/O ports 20 if only 50% of the maximum possible bandwidth is required. For up to 32 network I/O ports 20, the number of PMs 12 accommodated is determined by counting the number of connections between the switch nodes 16 on the number of boards indicated. Beyond 32 network I/O ports 20, the number of boards required is strictly determined by the number of optical transceivers 22 required to accommodate that number of PMs 12.
Table III shows an example of the number of Type-A boards 170 to install in the least populated CMA/A 182 given the number of PMs 12 to be connected to the depopulated CMA/A 182. This assumes 100% of the maximum possible bandwidth is to be provided. In this case, the number of boards required is always limited by the number of connections available between Stage 0 and Stage 1 switch nodes 16. In a network 14 with more than 64 PMs 12, a Type-B board 172 is provided for every two Type-A boards 170. However, there must be at least as many Type-B boards 172 as there are CMA/As 182, so extra boards may have to be added. In most cases, if any additional hardware is required, the addition of a single PM 12 to the network 14 may require the addition of one Type-A board 170, and one Type-B board 172 per network 14. If the current number of PMs 12 is a multiple of 64, then the addition of a single PM 12 requires two to four additional Type-B boards 172, possibly an additional CMA/B 184
chassis, an additional CMA/A 182 chassis, 2 additional Type-D boards 176, and one additional Type-A board 170 for every group of four Type-B boards 172 (maximum of eight). On average, however, the number of boards and CMAs required is directly proportional to the number of PMs 12.
In the #A formula above, for 100% bandwidth, as the network 14 grows from 1 to 512 network I/O ports 20, the term:
makes sure there are enough network I/O ports 20 to plug PMs 12 into. This term handles the case where N is 64x.
The term:
calculates the number of completely full CMA/As 182, as long as there is at least one more partially populated one.
In the term:
(N MOD 64) calculates the leftover part for the partially populated CMA/A 182 and the SQRT function accounts for the cross-connect between stages 0 and 1. If this is larger than the second term (B), then we are assured of being able to cross-connect all Type-B boards 172.
The term:
makes sure there are enough Type-A boards 170 to cross-connect with Type-B boards 172. This is where the overhead comes from.
The term:
assures that the (D) term is used only if N>64.
To compare the results for the #A formula for both N=64x and N=64x+1, 1<x<8, examine the following derivation:
MAX(.left brkt-top.(64x+1)/8.right brkt-top., (8 * .left brkt-bot.((64x+1)-1)/64.right brkt-bot.+
MAX(.left brkt-top.SQRT(64x+1) mod 64.right brkt-top., .left brkt-top.(64x+1)/64.right brkt-top.)))-
MAX(.left brkt-top.64x/8.right brkt-top., (8 * .left brkt-bot.(64x-1)/64.right brkt-bot.+
MAX(.left brkt-top.SQRT(64x MOD 64).right brkt-top., .left brkt-top.64x/64.right brkt-top.)))
=MAX((8x+1), (8x+MAX(1, x+1)))-
MAX(8x, (8 * (x-1)+MAX(0, x)))
=MAX((8x+1), (8x+x+1))-
MAX(8x, (8x-8+x))
=(9x+1)-8x
=x+1
This is the number of Type-A boards 170 added in crossing over from N=64x to N=64x+1. Since we would expect to add 1 due to linear growth, the overhead is x. This percentage of the total is 100 * x/8x=1/8 * 100=12.5%. The overhead, x, comes from the term:
for N=64x+1, which accounts for providing cross connections to the Type-B boards 172. The constant overhead ratio is due to the fact that the number of extra boards grows as x, and networks 14 that are multiples of 64 in size, by definition grow as x. The 1/8th value is due to the fact that eight Type-A boards 170 are needed for every 64 network I/O ports 20 provided, but only one extra Type-A board 170 is needed per 64 network I/O ports 20 in the least populated CMA/A 182 to allow it to be connected to the Type-B boards 172.
If the above derivation was repeated for the remaining formulae, i.e., for the #CMA/A, #CMA/B, #B, and #D formulae, as illustrated in Table I, none of the increases would exceed 12.5%. Those skilled in the art will readily recognize how to derive the other formulae, based on the information given above.
In changing from one network 14 size to another, it may be necessary and/or desirable to completely disconnect all of the intra-network 14 cables and reconnect them for the new configuration. For small networks 14 (relative to one with 512
network I/O ports 20), the changes will typically involve moving a small number of cables from one board to another as will be illustrated below.
For networks 14 with at least 65 and no more than 512 network I/O ports 20, the eight connectors at one end of the cable harness assembly 178 described above are attached to the corresponding eight connectors on the four Type-B boards 172 in one group of a CMA/B 184. The eight connectors at the other end of the cable harness assembly 178 are distributed evenly among CMA/As 182 that are fully populated with Type-A boards 170, and are attached to Type-D boards 176 within the selected CMA/As 182. Connectors that would be allocated to a CMA/A 182 that is partially filled with Type-A boards 170 are evenly redistributed to CMA/As 182 that have all eight Type-A boards 170.
For networks 14 with at least 65 and no more than 512 network I/O ports 20, to provide at least 50% of the maximum possible bandwidth, the number of cable harness assemblies used to interconnect X CMA/As 182 to .left brkt-top.X/4.right brkt-top. CMA/Bs 184 is X/2 if X is even and (X+1)/2 if X is odd, wherein .left brkt-top.X/4.right brkt-top. is a ceiling function providing the smallest integer not less than X/4. Cable harness assemblies can be added one at a time until there are a total of X cable harness assemblies, at which point 100% of the maximum possible bandwidth will be available.
FIG. 15 shows a simplified wiring diagram describing how the switch nodes 16 are connected in a network 14 having 128 network I/O ports 20. The CMAs are represented by the solid boxes. The left hand block represents a CMA/A 182 with eight Type-A boards 170. The right hand block represents a CMA/B 184 with two groups of four Type-B boards 172 each therein. Two cable harness assemblies are used to link the Type-A boards 170 in each CMA/A 182 to the Type-B boards 172 in the CMA/B 184.
FIGS. 16(a), (b), (c) and (d) provide simplified wiring diagrams describing the expansion from 64 PMs 12 to 65-128 PMs 12. In each case, each PM 12 gets at least 50% of the maximum possible bandwidth.
In FIG. 16 (a), CMA/A #1 need only contain one Type-A board 170 and one Type-D board 176 and only one connector from the CMA/A end of the cable harness assembly 178 is connected to the Type-D board 176. The other seven connectors are attached to any seven of the eight available Type-D connectors in CMA/A #0. Recall that the Type-A boards 170 comprise Stages 0 and 1 of the network 14, so all PMs 12 attached to CMA/A #0 can establish paths to switch nodes 16 in Stage 1 to which a cable is attached. The switch nodes 16 in Stage 0 will automatically sense any Stage 1 switch nodes 16 that are unconnected and avoid trying to establish paths through them. Note also that there would be up to 64 optical cables attached to the "left" side of each CMA/A 182 in the FIG. 16 (a) for connection to the PMs 12, although they are not explicitly shown.
FIG. 16 (b) shows the cabling for the situation in which there are three to eight additional PMs 12 beyond 64. Two Type-A boards 170 are required in CMA/A #1 and each associated connector on the Type-D board 176 must have a cable harness assembly 178 attached to maintain a balanced bandwidth between CMA/A #0 and CMA-A #1. A connection is moved from CMA/A #0 to CMA/A #1 for each Type-A board 170 added until there are at least four. At that point, the bandwidth is as evenly split as possible using one cable harness assembly 178. Again, within each CMA/A 182, it does not matter to which of the eight possible connection points four of the cable connectors are attached. It also does not marks which four of the cables in the cable harness assembly 178 go to which CMA/A 182, they just have to be evenly divided to maintain uniform bandwidth, in any event, the network 14 would still function correctly.
FIG. 16 (c) shows the cabling for the situation in which there are 9-18 additional network I/O ports 20 beyond 64.
FIG. 16 (d) shows the cabling for the situation in which there are 19-78 additional network I/O ports 20 beyond 64.
FIG. 17 shows the cabling for the situation in which there are 512 network I/O ports 20 in the network 14. Twelve CMAs are present comprising eight CMA/As 182 that are fully populated with eight Type-A boards 170 (and two Type-D boards 176), and four CMA/Bs 184 with each group populated with four Type-B boards 172. All of the, CMAs are housed in two docked cabinets (not shown). Eight cable harness assemblies are used to connect the CMA/As 182 to the CMA/Bs 184. The bandwidth of this network
14 can be reduced in increments of 1/8th by depopulating Type-B boards 172 from any CMA/B 184, four at a time. For each set of four Type-B boards 172, i.e., one group, removed from a CMA/B 184, the corresponding cable harness assembly 178 is also eliminated. The main reason to depopulate would be to lower the cost of the network 14 without losing functionality.
FIG. 18 shows the cabling for the situation in which there are more than 512 network I/O ports 20 in the network 14. To configure a network 14 with more than 512 PMs 12 requires the use of a Type-C board 174 in place of the Type-B board 172 and a change in the way the cabling is implemented. Twelve CMAs are present comprising eight CMA/As 182 that are fully populated with eight Type-A boards 170 (and two Type-D boards 176), and four CMA/Cs 186 with two groups that are populated with four Type-C boards 174. These CMAs are housed in two docked cabinets (not shown). Functionally, it is necessary to use the cable harness assembly 178 of FIG. 14 with the Type-C boards 174. A total of eight such cable harness assemblies are required to connect the CMA/As 182 with the CMA/Cs 186 in FIG. 17. For each set of four Type-C boards 174, i.e., one group, removed from a CMA/C 186, the corresponding cable harness assembly 178 is also eliminated. The main reason to depopulate would be to lower the cost of the network 14. Depopulating also reduces cabling.
The Universal Wiring Pattern is embodied by the cable harness assembly 178. To cross-connect the docked cabinets each cable harness assembly 178 is cut in the middle and attached to connectors 18. This allows the cabinets to be connected via cable bundles 190 that contain parallel wires. The constraints on the way in which the cable bundles 190 are connected between cabinets are similar to the intra-cabinet cabling discussed earlier. The two rules are: (1) two ends of the same cable bundle
190 shall not be plugged into the same connector types; and (2) the cable bundles 190 shall be uniformly distributed among all docked cabinets. As a result, there is tremendous flexibility in the configurations and in the connections of the network 14.
FIG. 19 shows the cabling for the situation in which there are 1024 network I/O ports 20 in the network 14. Each pair of docked cabinets 188 contains twelve CMAs. Eight CMA/As 182 are fully populated with eight Type-A boards 170 (and two Type-D boards 176) each, and four CMA/Cs 186 with two groups are populated with four Type-C boards 174. In this case, to balance the bandwidth, four cable bundles 190 each connect the cabinets 188 to themselves and another eight cables cross-connect into each other. The configuration shown is cabled for 100% of the maximum possible bandwidth. At the 50% level, the cable bundles 190 shown in dashed lines would be removed as well as all Type-C boards 174 in the lower docked cabinet 188 pair labeled as #1.
FIG. 20 shows the largest possible configuration of 4096 network I/O ports 20 using eight pairs of docked cabinets 188 to house the network 14. A total of 64 cable bundles 190 are needed in this case. The bandwidth can be lowered by removing sets of Type-C boards 174, one docked cabinet 188 pair at a time. For each docked cabinet 188 pair, eight cable bundles 190 are removed.
Notice that the lines representing the cable bundles 190 in FIG. 20 form the Universal Wiring Pattern (UWP). This is because there are 64 copies of the UWP used to connect stage 1 switch nodes 16 to stage 2 switch nodes 16, and the wires that form each cable bundle 190 have been chosen to be from the same location in each of the 64 copies, i.e., it is as though the 64 UWPs were all stacked on top of each other.
Any configuration other than those illustrated can be readily constructed by following the minimal construction rules outlined above. It is understood that the manufacturing, field service, and marketing organizations may wish to impose additional rules for the sake of simplicity and/or minimizing the number of different configurations. Of note, however, is the ability to configure any network 14 size using the smallest possible amount of hardware that gets the job done. In particular, an entry level network 14 can be offered with two depopulated CMA/As 182, which keeps the cost as low as possible.
12. SWITCH NODE ADDRESSING
Referring again to FIG. 4, each 8x8 switch node 16 has a 12 bit chip address register 121 that is used for specifying the switch node 16 location in the network 14. This location, called the chip's address, is defined as:
The bit positions are defined in Table IV. At startup, the chip address register 121 is loaded from the DP 140.
The Right/Left bit, c.sub.11, distinguishes between switch nodes 16 that route traffic to the right from the PM 12 to the bounce-back point 30 in the folded network 14, versus switch nodes 16 that route traffic to the left from the bounce-back point 30 in the folded network 14 to the PM 12. Bit C.sub.11 is set to 02 for those switch nodes 16 with right arrows, #1 and #3, on Type-A boards 170 and Type-B boards 172 as shown in FIG. 6 and FIG. 7. Bit c.sub.11 is set to 12 for those switch nodes
16 with left arrows, #2 and #4, on Type-A boards 170 and Type-B boards 172 as shown in FIG. 6 and FIG. 7.
The Stage number, c.sub.10 c.sub.9, is 00.sub.2 for those switch nodes 16 on Type-A boards 170 that connect to controllers 18. They are under the "Stage 0" label in FIG. 6. Bits c.sub.10 c.sub.9 are 01.sub.2 for those switch nodes 16 on Type-A boards 170 under the "Stage 1" label in FIG. 6. On the Type-B board 172 shown in FIG. 7, all four of the switch nodes 16 have their c.sub.10 c.sub.9 bits set to 10.sub.2.
Bits c.sub.8 . . . c.sub.0 determine the switch node 16 Level number in the network 14. This number, appended at the least significant end with a three bit switch node 16 port number, P.sub.2 P.sub.1 P.sub.0, defines the Level of the network I/O port 20 in the network 14, i.e., c.sub.8 . . . C.sub.0 p.sub.2 p.sub.1 p.sub.0.
Bits c.sub.2 c.sub.1 c.sub.0 are derived for every switch node 16 on a Type-A board 170 from its slot location in the CMA/A 182. The locations are encoded in four dedicated pins per slot from the backplane 180. The encoding begins with
0000.sub.2 in the leftmost board slot (the power board) and ends with 1010.sub.2 in the right most board slot. The DP 140 translates these physical numbers into the logical three bit number, c.sub.2 c.sub.1 c.sub.0, needed. After translation, the left most Type-A board 170 slot is assigned 000.sub.2. Each subsequent Type-A board 170 is assigned a number which increases by 1 (skipping over Type-D slots) up to the right most Type-A board 170, which is 111.sub.2.
Bits c.sub.3 c.sub.2 c.sub.1 are derived for every switch node 16 on a Type-B board 172 from its slot location in the CMA/B 184. The locations are encoded with four dedicated pins per slot from the backplane 180. The encoding begins with
0000.sub.2 in the left most board slot (the power board) and ends with 1010.sub.2 in the right most board slot. The DP 140 translates these physical numbers into the logical three bit number, c.sub.3 c.sub.2 c.sub.1, needed. After translation, the left most Type-B board 172 is assigned 000.sub.2. Each subsequent Type-B board 172 is assigned a number which increases by 1 up to the right most Type-B board 172, which is 111.sub.2.
Bit c.sub.0 is 0 for the upper two switch nodes 16 on a Type-B board 172 and 1 for the lower two switch nodes 16.
For a CMA/A 182, bits c.sub.5 c.sub.4 c.sub.3 are derived from the CMA's location in the cabinet 188. For a CMA/B 184 or CMA/C 186, bits c.sub.5 c.sub.4 are derived from the CMA's location in the cabinet 188. They are the same for all switch nodes 16 on every board in the same CMA. The DP 140 derives these bits as described in the dynamic configuration procedure, described hereinafter, and stores them into each switch node 16 to which it is connected.
Bits c.sub.8 c.sub.7 c.sub.6 are derived from the most significant three bits of the four least significant bits of the cabinet 188 number. One docked cabinet 188 pair has an even cabinet 188 number and the other in the pair has the next larger number. The cabinet 188 number is determined during the dynamic configuration procedure by the DP 140 in the power subsystem, i.e., the gateway DP (not shown). This number is distributed to all DPs 140 in the cabinet 188 by the LAN 144 interconnecting the DPs 140. Each DP 140 stores the number into each switch node 16 on a board to which it is connected. For networks 14 with no more than 512 network I/O ports 20, in the case of a local area network 14 failure, these bits are set to 0. For networks
14 with no more than 512 network I/O ports 20, these bits are the same in every switch node 16 in the network 14. For networks 14 with more than 512 network I/O ports 20, the cabinets 188 containing one network 14 are numbered sequentially, starting with an even number.
13. AUTOMATIC PROCESSOR ID ASSIGNMENT
Automatic processor identification assignment consists of the ability to plug a PM 12 into any available network I/O port 20 and have it receive a unique port identifier from the network 14. Thus, each PM 12 in the network 14 can determine its address in either network 14 by simply asking the network 14. This means that it does not matter where any given PM 12 is plugged into the network 14. This greatly simplifies network 14 installation.
The PM's address within a given network 14 is determined by the Level number of the network I/O port 20 to which it is connected in that network 14. The PM 12 determines its address in each network 14 by transmitting a Send-Port-Addr command to the network 14. The switch node 16 that receives this command supplies the network I/O port 20 address via the Escape reply with a Port-Addr-Is-Key and the address itself.
Bits c.sub.8. . . c.sub.0 determine the Level number of the switch node 16 in the network 14. This number, appended at the low order end with a three bit switch node 16 port number, p.sub.2 p.sub.1 p.sub.0, defines the Level of the network I/O port 20: c.sub.8 . . . c.sub.0 p.sub.2 p.sub.1 p.sub.0. This is the address that is supplied to a PM 12 when it asks the network 14 where it is attached.
14. DYNAMIC CONFIGURATION
FIG. 21 is a flow chart describing the steps required for configuring the network 14. Since cables connecting the boards in the network 14 can be configured in relatively arbitrary ways, the network 14 automatically determines how it is cabled and uses that configuration to establish the path between PMs 12. A protocol between switch nodes 16 permits one switch node 16 to ask another switch node 16 at the other end of a back channel 34 to transmit its chip address back via the forward channel
32. These chip addresses are used to build the tag mapping tables 108, which ensure that routing tags can be correctly interpreted to establish communication paths between PMs 12.
After a PM 12 is powered up, it performs a self test procedure to test the links of the network 14. It then transmits a Send-Port-Addr command to the network 14 and waits for an Escape reply with a Port-Addr-Is key on the back channel 34 which contains the 12-bit address for the PM 12 on the network 14.
If the state of the network 14 is "configuring", the PM 12 volunteers to perform the configuration task. A local DP 140, i.e., a DP 140 on the Type-A board 170 connected to the PM 12, signals whether the PM 12 has been accepted or rejected as the Master PM 12 (only one PM 12 per network 14 may be designated as a Master PM 12). If it is rejected, the PM 12 disconnects from the DP 140 and waits to be notified that the configuration is complete. If it is accepted, the Master PM 12 configures the network 14.
The configuration steps determine the topology of the network 14 and account for any switch nodes 16 or links that fail a self-test. The Master PM 12 constructs the tag mapping tables 108 that account for the topology. The network 14 is available for use once these tables 108 are reloaded in the switch nodes 16.
At startup, each DP 140 fills in the chip address register 121 of each switch node 16 on its board, i.e., bits c.sub.11 -c.sub.0. All switch nodes 16, except switch nodes 16 in the "right" stage 0 connected to the controllers 18, activate their forward channel 32 carriers after the DP 140 has enabled all output ports of the switch node 16 by setting enable bits in each output status register 92. The DP 140 also enables the input ports of the switch node 16 by setting enable bits in each input status register 66.
Each input port of a switch node 16 is instructed by the DP 140 to test its back channel 34 by transmitting an Escape Reply with a Send-Chip-Addr key. Each output port that receives the Send-Chip-Addr key on its back channel 34 reads its chip address register 121 and sends the Chip-Addr-Is command out the forward channel 32. Receipt of the Chip-Addr-Is command by each input port on every switch node 16 that requested the chip address constitutes a test of all forward and back channel links.
When the Chip-Addr-Is command is received by an input port of a switch node 16, the DP 140 stores the address in RAM 142. The DP 140 builds a table with eight entries per switch node 16 that identifies where each input port is connected. The DP
140 reads the input status register 66 of each input port on each switch node 16 and constructs an eight bit input enable vector for each switch node 16 that indicates which ports are receiving a carrier. The DP 140 reads the output status register 92
of each output port on each switch node 16 and constructs an eight bit output enable vector for each switch node 16 that indicates which ports are receiving a carrier. Collectively, this information, and the type and location of faults detected by DPs
140, represents the raw topology of the network 14. The raw topology information is redundant by virtue of the fact that the network 14 is symmetric and folded.
The Master PM 12 gets the raw topology information from the DPs 140 via the LAN 144 interconnecting the network DPs 140 and the local of the Master PM 12 DP 140. A local DP 140 is that DP 140 on a Type-A board 170 which is connected to a stage 0
switch node 16 that is directly connected to the controller 18 of a PM 12. The Master PM 12 sends the network 14 a DP Connect command and the local DP 140 returns the raw topology information associated with its local switch nodes 16 to the Master PM
12. The local DP 140 then requests that every other DP 140 in the network 14 transmit its raw topology information, so it can be passed to the Master PM 12.
Once the Master PM 12 has received all the raw topology information, it calculates the tag mapping tables 108, multicast port select vectors, and input and output enable vectors for each switch node 16 in the network 14. The calculation includes a consistency check on the data and a validation check to make sure no cabling rules have been violated. The information for the tag mapping tables 108 for each of the switch nodes 16 is derived from the chip addresses, either of the switch node 16 in the next stage connected directly thereto, or of the switch node 16 in the following stage. The tag mapping table 108 needs only .left brkt-top. log.sub.2 b .right brkt-top. bits per entry rather than .left brkt-top. log.sub.2 N .right brkt-top. bits, e.g., 3 bits versus 12 bits.
If any faults are reported, the calculations simulate the removal of the faulty component by deleting the appropriate entries in the raw topology information. For example, if a switch node 16 has failed, up to 16 links may be deleted. The output enable vectors are set to disable output ports where the links have been removed so that the load balancing logic will not select those ports. The tag mapping tables 108 also must not point to a disabled output port or an error will be reported if a routing tag references the output port. Input ports are disabled so that no spurious errors will be reported, i.e., the output ports they are connected to are disabled and/or faulty, so they are either sending nothing, which is an error, or garbage, which has already been diagnosed.
If the failure is in a non-local switch node 16 or link (one not directly connected to a controller 18), the redundant nature of the network 14 guarantees that the tag mapping tables 108, multicast port select vectors, and input and output enable vectors can be computed with no loss of functionality, although there is a slight decrease in the bandwidth of the network 14. It may not be possible to preserve functionality if there are multiple failures, depending upon the specific combination of failures.
If there is one or more failures of local switch nodes 16 or links, the network 14 can be configured to be functional for point-to-point communications only if the controllers 18 connected to the faulty components are disabled. The network 14
cannot be used for broadcast or multicast. The other network 14 is used for that purpose.
When the calculation of the tag mapping tables 108, multicast port select vectors, and input and output enable vectors is complete, the Master PM 12 re-establishes connection with its local DP 140 and transfers the tag mapping tables 108, multicast port select vectors, and input and output enable vectors in packages grouped by switch node 16. As the local DP 140 receives each switch node 16 package, it transmits the package to the appropriate DP 140. The process continues until all DPs
140 have received the packages for every switch node 16.
When each DP 140 receives the package, it selects the correct switch node 16 and writes eight tag mapping tables 108 into the output port select 58 in each IPL 36 of the switch node 16. The DP 140 then enables and disables the eight input ports of the switch node 16 according to the selected eight bit input enable vector, one bit per input status register 108; the DP 140 also enables and disables the eight output ports of the switch node 16 according to the selected eight bit output enable vector, one bit per output status register 108. The multicast port select register 130 of the switch node 16 is also loaded with the correct multicast port select vector. Upon completion of this task for each switch node 16, the DP 140 signals the local DP 140 with an acknowledgement.
When the local DP 140 determines that all switch nodes 16 have been configured, it signals the Master PM 12 that the configuration is complete. The Master PM 12 then signals the local DP 140 to change the state of the network 14 from "configuring" to "ready." The local DP 140 broadcasts the state change to all other DPs 140 via the LAN 144 connecting the DPs 140. The network 14 is then ready for use.
Any PMs 12 that query the local DP 140 for the current state of the network 14 will find out that it is ready for use. At this point, all active PMs 12 execute a distributed algorithm to build their processor routing tables. These routing tables comprise the list of active PMs 12 and their addresses in the network 14.
A PM 12 that has just initialized and determines that either or both networks 14 are in the ready state, notifies the other PMs 12 of its presence in the network 14. The PM 12 multicasts its network I/O port 20 address on each network 14 to all other PMs 12. By merging replies using an addition mode, the PM 12 knows how many PMs 12 have received the multicast. Each receiving PM 12 adds the network I/O port 20 address to its table of PM 12 locations on the indicated network 14.
A flag is set to note if either network 14 is to be used for point-to-point traffic only. In such a case, some PMs 12 are not included on the list for that network 14, but are on the list of the network 14 capable of performing multicasts.
Each PM 12 transmits a point-to-point message to the sending PM 12 of the multicast indicating its I/O port address on each network 14. The sending PM 12 can then build its PM 12 routing tables from the point-to-point addresses received. Thus, an existing network 14 can be expanded online.
FIG. 22 is a flow chart describing the steps required for reconfiguring the network 14 when a fault occurs therein. If a fault is detected, the DP 140 can request that the network 14 be reconfigured so that the fault can be isolated. Communications in the faulty network 14 are interrupted during reconfiguration. However, communications within the system 10 are not interrupted because there are two networks 14. The controllers 18 in each PM 12 automatically switch over to the operational network 14 until the reconfiguration is complete, and then return to load balancing traffic between the two networks 14.
For the most part, the reconfiguration steps are similar to the steps performed at network 14 startup. What is different is that the configuring Master PM 12 ,identifies the fault location, via information received from the switch nodes 16 and DPs 140.
In FIG. 22, a continuous loop executes so long as there are any unprocessed faulty links or nodes. Within the loop, faulty links and switch nodes 16 are processed according to their location on either side of the bounce-back point 30.
For a faulty "left" link, i.e., a fault on a link between switch nodes 16 in the left half of an unfolded network 14, including links connected to the output of the last stage, then the Master PM 12 traces back on the link and disables the output port of the connected switch node 16. If this results in all the output ports on the connected switch node 16 being disabled, then the connected switch node 16 is marked as bei