Home
Patent Search
IMT Blog
REGISTER
|
SIGN IN
United States Patent
5530809
Douglas , ; et al.
June 25, 1996
Title
Router for parallel computer including arrangement for redirecting messages
Abstract
A digital computer comprising a plurality of message generating nodes interconnected by a routing network. The routing network transfers messages among the message generating elements in accordance with address information identifying a destination message generating element. Each message generating node includes a message data generator and a network interface. The message data generator generates message data items each including an address data portion comprising a destination identifier. The network interface includes a message generator and an address translation table, the table including a plurality of entries identifying, for at least one destination identifier, a translated destination identifier. The message generator, in response to the receipt of a message data item from the message data generator, generates a message for transmission to the routing network. In generating the message, the message generator performs an address translation operation in connection with the address data and the contents of the address translation table to generate updated address data which it uses data in connection with generating address information for the message.
Inventors:
Douglas; David C.
(Concord,
MA
)
, Leiserson; Charles E.
(Winchester,
MA
)
, Kuszmaul; Bradley C.
(Waltham,
MA
)
, Yang; Shaw-Wen
(Concord,
MA
)
, Hillis; W. Daniel
(Cambridge,
MA
)
, Wells; David
(Bolton,
MA
)
, Feynman; Carl R.
(Acton,
MA
)
, Walker; Bruce J.
(Framingham,
MA
)
, Kahle; Brewster
(San Francisco,
CA
)
Assignee:
Thinking Machines Corporation
(Bedford,
MA
)
Appl. No.:
181711
Filed:
January 14, 1994
Current U.S. Class:
709/250
709/238
709/245
Field of Search:
395/200,800,200.02,200.2 370/85.1,85.6,92,93
U.S. Patent Documents
4562539
December 1985
Vince
5133053
July 1992
Johnson et al.
5185860
February 1993
Wu
Primary Examiner:
Lee; Thomas C.
Assistant Examiner:
Dinh; D.
Attorney, Agent or Firm:
Jordan; Richard A.
Parent Case Text
This application is a divisional of U.S. patent application Ser. No. 07/946,242 filed Sep. 16, 1992, now U.S. Pat. No. 5,333,268 which is a continuation of U.S. patent application Ser. No. 07/592,029 filed Oct. 3, 1990 and now abandoned.
Claims
What is claimed as new and desired to be secured by Letters Patent of the United States is:
1. A digital computer comprising a plurality of message generating nodes interconnected by a routing network:
A. the routing network transferring messages among said message generating nodes in accordance with address information identifying a destination message generating element to receive the message;
B. each message generating node including:
i. a message data generator for generating message data items each including an address data portion comprising a destination identifier;
ii. an interface including:
a) an address translation table including a plurality of entries each identifying, for at least one destination identifier, a translated destination identifier;
b) a message generator for generating, in response to the receipt of a message data item from said message data generator, a message for transmission to the routing network, said message generator including an address translator comprising:
i) a chunk size identifier for identifying a chunk size, said chunk size identifying a number of consecutive message generating nodes;
ii) a window extraction circuit for generating an address translation table entry identifier in response to said chunk size identifier and said address data and coupling said address translation table entry identifier to said address translation table, the address translation table selecting an entry in response to the received address translation table identifier and the original address values in said entries, said address translation table providing the translated address value from the selected entry at an output;
iii) a window insertion circuit for generating said updated address value in response to said chunk size identifier, the address data and the translated address value from the output of said address translation table;
said message generator using the updated address data in connection with generating address information for the message, the message generator coupling the message to said routing network.
2. A digital computer as defined in claim 1 in which the address data comprises a series of address digits, said window extraction circuit generating said address translation table entry identifier as a selected series of said address digits, the window extraction circuit selecting the series of address digits in response to the chunk size identifier.
3. A digital computer as defined in claim 1 in which said address value, said updated address value and said translated address value each comprises a respective series of address digits, said window insertion circuit generating said updated address value by substituting the series of address digits comprising said translated address value as a selected series in the address data, the window insertion circuit selecting the digits of the address data for which it substitutes the translated address value in response to the chunk size identifier.
4. A digital computer as defined in claim 1 wherein each message data item further includes an address mode flag having a plurality of conditions identifying the address data portion as having one of a plurality of address modes, said message generator selectively using the address data from the message data item or the updated address data from said address translator in generating a message in response to the message data item in response to the condition of the address mode flag of a message data item.
5. A digital computer as defined in claim 4 in which each message generating node is identified by a network identifier, in one address mode the address data in the address data portion of the message data item containing the network identifier of the message generating node to receive the message generated by said message generator in response thereto.
6. A digital computer as defined in claim 4 in which each message generating node is identified by a network identifier, in one address mode the address data in the address data portion of the message data item containing a relative address value identifying the difference between the network identifiers of the message generating node to receive the message and a predetermined message generating node.
7. A digital computer as defined in claim 4 in which said message generator further includes:
A. a latch;
B. an address mode flag decoder for decoding the address mode flag of a received message data item and generating an address mode signal to identify the address mode;
C. a multiplexer for selectively, in response to the address mode signal from said address mode flag decoder, coupling the address data from the message data item or the updated address data from said address translator as updated address data for storage in said latch; and
D. a control circuit for controlling storage of updated address data from said multiplexer in said latch.
8. A digital computer as defined in claim 7 wherein:
A. said address translation table provides an translated address value to said address translator in response to a translation enabling signal, and generates a translated address value valid signal when the translated address value provided to said address translator is valid; and
B. said control circuit includes:
i. an address translation table control circuit for selectively generating said translation enabling signal in response to said address mode signal;
ii. a latch control circuit for generating, in response to said address mode signal and the translated address value valid signal from said address translation table, an enabling signal for enabling said latch to store the updated address data from said multiplexer.
9. A digital computer as defined in claim 8 in which said latch has an output terminal connected to couple the contents of said latch to an updated address value processing stage, said updated address value processing stage generating an advance control signal for controlling the coupling of the contents of said latch thereto, said address translation table control circuit and said latch control circuit further operating in response to said advance control signal.
10. A digital computer as defined in claim 9 in which:
A. said message generator further includes delay indication generating means for generating, in response to a delay indication enabling signal, a delay indication to said updated address value processing stage to indicate a delay in generating an updated address value; and
B. said control circuit further includes a delay indication control circuit for generating said delay indication enabling signal in response to said advance control signal, said translated address value valid signal and said address mode signal, thereby to enable said delay indication generating means to generate the delay indication enabling signal in response to a delay by said address translation table in providing a translated address value if the address mode signal identifies a selected address mode in which the updated address data is generated in response to the translated address value.
11. In a digital computer comprising a plurality of message generating nodes interconnected by a routing network, the routing network transferring messages among said message generating nodes in accordance with address information identifying a destination message generating element to receive the message, a message generating node comprising:
A. a message data generator for generating message data items each including an address data portion comprising a destination identifier;
B. an interface including:
i. an address translation table including a plurality of entries each identifying, for at least one destination identifier, a translated destination identifier;
ii. a message generator for generating, in response to the receipt of a message data item from said message data generator, a message for transmission to the routing network, said message generator including an address translator comprising:
a) a chunk size identifier for identifying a chunk size, said chunk size identifying a number of consecutive message generating nodes;
b. a window extraction circuit for generating an address translation table entry identifier in response to said chunk size identifier and said address data and coupling said address translation table entry identifier to said address translation table, the address translation table selecting an entry in response to the received address translation table identifier and the original address values in said entries, said address translation table providing the translated address value from the selected entry at an output;
c. a window insertion circuit for generating said updated address value in response to said chunk size identifier, the address data and the translated address value from the output of said address translation table;
said message generator using the updated address data in connection with generating address information for the message, the message generator coupling the message to said routing network.
12. A message generating node as defined in claim 11 in which the address data comprises a series of address digits, said window extraction circuit generating said address translation table entry identifier as a selected series of said address digits, the window extraction circuit selecting the series of address digits in response to the chunk size identifier.
13. A message generating node as defined in claim 11 in which said address value, said updated address value and said translated address value each comprises a respective series of address digits, said window insertion circuit generating said updated address value by substituting the series of address digits comprising said translated address value as a selected series in the address data, the window insertion circuit selecting the digits of the address data for which it substitutes the translated address value in response to the chunk size identifier.
14. A message generating node as defined in claim 11 wherein each message data item further includes an address mode flag having a plurality of conditions identifying the address data portion as having one of a plurality of address modes, said message generator selectively using the address data from the message data item or the updated address data from said address translator in generating a message in response to the message data item in response to the condition of the address mode flag of a message data item.
15. A message generating node as defined in claim 14 in which each message generating node is identified by a network identifier, in one address mode the address data in the address data portion of the message data item containing the network identifier of the message generating node to receive the message generated by said message generator in response thereto.
16. A message generating node as defined in claim 14 in which each message generating node is identified by a network identifier, in one address mode the address data in the address data portion of the message data item containing a relative address value identifying the difference between the network identifiers of the message generating node to receive the message and a predetermined message generating node.
17. A message generating node as defined in claim 14 in which said message generator further includes:
A. a latch;
B. an address mode flag decoder for decoding the address mode flag of a received message data item and generating an address mode signal to identify the address mode;
C. a multiplexer for selectively, in response to the address mode signal from said address mode flag decoder, coupling the address data from the message data item or the updated address data from said address translator as updated address data for storage in said latch; and
D. a control circuit for controlling storage of updated address data from said multiplexer in said latch.
18. A message generating node as defined in claim 17 wherein:
A. said address translation table provides an translated address value to said address translator in response to a translation enabling signal, and generates a translated address value valid signal when the translated address value provided to said address translator is valid; and
B. said control circuit includes:
i. an address translation table control circuit for selectively generating said translation enabling signal in response to said address mode signal;
ii. a latch control circuit for generating, in response to said address mode signal and the translated address value valid signal from said address translation table, an enabling signal for enabling said latch to store the updated address data from said multiplexer.
19. A message generating node as defined in claim 18 in which said latch has an output terminal connected to couple the contents of said latch to an updated address value processing stage, said updated address value processing stage generating an advance control signal for controlling the coupling of the contents of said latch thereto, said address translation table control circuit and said latch control circuit further operating in response to said advance control signal.
20. A message generating node as defined in claim 19 in which:
A. said message generator further includes delay indication generating means for generating, in response to a delay indication enabling signal, a delay indication to said updated address value processing stage to indicate a delay in generating an updated address value; and
B. said control circuit further includes a delay indication control circuit for generating said delay indication enabling signal in response to said advance control signal, said translated address value valid signal and said address mode signal, thereby to enable said delay indication generating means to generate the delay indication enabling signal in response to a delay by said address translation table in providing a translated address value if the address mode signal identifies a selected address mode in which the updated address data is generated in response to the translated address value.
Description
INCORPORATION BY REFERENCE
Guy E. Blelloch, Scan Primitives and Parallel Vector Models, (Ph.D. Dissertation, Massachusetts Institute of Technology: 1988), incorporated herein by reference.
U.S. patent application Ser. No. 07/489,079, filed Mar. 5, 1990, now U.S. Pat. No. 5,118,975, in the name of W. Daniel Hillis, et al., entitled Digital Clock Buffer Circuit Providing Controllable Delay, and assigned to the assignee of the present application, incorporated herein by reference.
FIELD OF THE INVENTION
The invention relates generally to the field of digital computer systems, and more particularly to massively parallel computing systems. The invention particularly provides arrangements for controlling processors in a computing system having a large number of processors, for facilitating transfer of data among the processors and for facilitating diagnosis of faulty components in the computing system.
BACKGROUND OF THE INVENTION
A digital computer system generally comprises three basic elements, namely, a memory element, an input/output element and a processor element. The memory element stores information in addressable storage locations. This information includes data and instructions for processing the data. The processor element fetches information from the memory element, interprets the information as either an instruction or data, processes the data in accordance with the instructions, and returns the processed data to the memory element. The input/output element, under control of the processor element, also communicates with the memory element to transfer information, including instructions and the data to be processed, to the memory, and to obtain processed data from the memory.
Most modern computing systems are considered "von Neumann" machines, since they are generally constructed according to a paradigm attributed to John von Neumann. Von Neumann machines are characterized by having a processing element, a global memory which stores all information in the system, and a program counter that identifies the location in the global memory of the instruction being executed. The processing element executes one instruction at a time, that is, the instruction identified by the program counter. When the instruction is executed, the program counter is advanced to identify the location of the next instruction to be processed. (In many modern systems, the program counter is actually advanced before the processor has finished processing the current instruction.)
Von Neumann systems are conceptually uncomplicated to design and program, since they do only one operation at a time. A number of advancements have been made to the original von Neumann paradigm to permit the various parts of the system, most notably the various components of the processor, to operate relatively independently and achieve a significant increase in processing speed. One such advancement is pipelining of the various steps in executing an instruction, including instruction fetch, operation code decode (a typical instruction includes an operation code which identifies the operation to be performed, and in most cases one or more operand specifiers, which identify the location in memory of the operands, or data, to be used in executing the instruction), operand fetch, execution (that is, performing the operation set forth in the operation code on the fetched operands), and storing of processed data, which steps are performed relatively independently by separate hardware in the processor. In a pipelined processor, the processor's instruction fetch hardware may be fetching one instruction while other hardware is decoding the operation code of another instruction, fetching the operands of still another instruction, executing yet another instruction, and storing the processed data of a fifth instruction. Since the five steps are performed sequentially, pipelining does not speed up processing of an individual instruction. However, since the processor begins processing of additional instructions before it has finished processing a current instruction, it can speed up processing of a series of instructions.
A pipelined processor is obviously much more complicated than a simple processor in a von Neumann system, as it requires not only the various circuits to perform each of the operations (in a simple von Neumann processor, many circuits could be used to perform several operations), but also control circuits to coordinate the activities of the various operational circuits. However, the speed-up of the system can be dramatic.
More recently, some processors have been provided with execution hardware which includes multiple functional units each being optimized to perform a certain type of mathematical operation. For example, some processors have separate functional units for performing integer arithmetic and floating point arithmetic, since they are processed very differently. Some processors have separate hardware functional units each of which performs one or only several types of mathematical operations, including addition, multiplication, and division operations, and other operations such as branch control and logical operations, all of which can be operating concurrently. This can be helpful in speeding up certain computations, most particularly those in which several functional units may be used concurrently for performing parts of a single computation.
In a von Neumann processor, including those which incorporate pipelining or multiple functional units (or both, since both may be incorporated into a single processor), a single instruction stream operates on a single data stream. That is, each instruction operates on data to enable one calculation at a time. Such processors have been termed "SISD," for single-instruction/single-data." If a program requires a segment of a program to be used to operate on a number of diverse elements of data to produce a number of calculations, the program causes the processor to loop through that segment for each calculation. In some cases, in which the program segment is short or there are only a few data elements, the time required to perform such a calculation may not be unduly long.
However, for many types of such programs, SISD processors would require a very long time to perform all of the calculations required. Accordingly, processors have been developed which incorporate a large number of processing elements all of which may operate concurrently on the same instruction stream, but with each processing element processing a separate data stream. These processors have been termed "SIMD" processors, for "single-instruction/multiple-data."
SIMD processors are useful in a number of applications, such as image processing, signal processing, artificial intelligence, database operations, and computer simulation of a number of things, such as electronic circuits and fluid dynamics. In image processing, each processing element may be used to perform processing on a pixel ("picture element") of the image to enhance the overall image. In signal processing, the processors concurrently perform a number of the calculations required to perform such computations as the "Fast Fourier transform" of the data defining the signal. In artificial intelligence, the processors perform searches on extensive rule bases representing the stored knowledge of the particular application. Similarly, in database operations, the processors perform searches on the data in the database, and may also perform sorting and other operations. In computer simulation of, for example, electronic circuits, each processor may represent one part of the circuit, and the processor's iterative computations indicate the response of the part to signals from other parts of the circuit. Similarly, in simulating fluid dynamics, which can be useful in a number of applications such as weather predication and airplane design, each processor is associated with one point in space, and the calculations provide information about various factors such as fluid flow, temperature, pressure and so forth.
Typical SIMD systems include a SIMD array, which includes the array of processing elements and a router network, a control processor and an input/output component. The input/output component, under control of the control processor, enables data to be transferred into the array for processing and receives processed data from the array for storage, display, and so forth. The control processor also controls the SIMD array, iteratively broadcasting instructions to the processing elements for execution in parallel. The router network enables the processing elements to communicate the results of a calculation to other processing elements for use in future calculations.
Several routing networks have been used in SIMD arrays and others have been proposed. In one routing network, the processing elements are interconnected in a matrix, or mesh, arrangement. In such an arrangement, each processing element is connected to, and communicates with, four "nearest neighbors" to form rows and columns defining the mesh. This arrangement can be somewhat slow if processing elements need to communicate among themselves at random. However, the arrangement is inexpensive and conceptually simple, and may suffice for some types of processing, most notably image processing. The "Massively Parallel Processor" manufactured by Goodyear Aerospace Corporation is an example of a SIMD array having such a routing network.
In another routing network, processing elements are interconnected in a cube or hypercube arrangement, having a selected number of dimensions, for transferring data, in the form of messages, among the processing elements. The arrangement is a "cube" if it only has three dimensions, and a "hypercube" if it has more than three dimensions. U.S. Pat. No. 4,598,400, entitled Method and Apparatus For Routing Message Packets, issued Jul. 1, 1986 to W. Daniel Hillis, and assigned to the assignee of the present application, describes a system having a hypercube routing network. In the system described in the '400 patent, multiple processing elements are connected to a single routing node, and the routing nodes are interconnected in the hypercube.
Another routing arrangement which has been proposed is a crossbar switch, through which each processing element can communicate directly with any of the other processing elements. The crossbar switch provides the most efficient communications of any of the routing networks proposed. However, a crossbar switch also has the most connections and switching elements, and thus is the most expensive and also the most susceptible to failure due to broken connections and faulty switching elements. Thus, crossbar switch arrangements are rarely used, except when the number of processing elements is fairly small, since the complexity of a crossbar switch increases with the square of the number of processing elements.
Yet another routing arrangement is an omega network, in which switching is performed through a number of serially-connected stages. Each stage has two inputs, each connected to the outputs of a prior stage or processing elements, has two outputs which may be connected to the inputs of a subsequent stage or processing elements. The "Butterfly" computer system manufactured by Bolt Beranek & Newman uses such a network.
SUMMARY OF THE INVENTION
The invention provides a new and improved parallel computer system.
In brief summary, the new computer includes a plurality of processing elements, a command processor, a diagnostic processor and a communications network. The processing elements each performs data processing and data communications operations in connection with commands. The processing elements also performing diagnostic operations in response to diagnostic operation requests and providing diagnostic results in response thereto. The command processor generates commands for the processing elements, and also performs diagnostic operations in response to diagnostic operation requests and providing diagnostic results in response thereto. The diagnostic processor generates diagnostic requests. The communication network includes three elements, including a data router, a control network and a diagnostic network. The data router is connected to the processing elements for facilitating the transfer of data among them during a data communications operation. The control network is connected to the processing elements and the command processor for transferring commands from the command processor to the processing elements. The diagnostic network connected to the processing elements, the command processor and the diagnostic processor for transferring diagnostic requests from the diagnostic processor to the processing elements and the command processor and for transferring diagnostic results from the processing elements and the command processor to the diagnostic processor.
BRIEF DESCRIPTION OF THE DRAWINGS
This invention is pointed out with particularity in the appended claims. The above and further advantages of this invention may be better understood by referring to the following description taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a general block diagram of a massively parallel computer system constructed in accordance with the invention;
FIGS. 2A and 2B are block diagrams useful in understanding the structure and operation of the data router of the computer system of FIG. 1;
FIG. 3 is a diagram depicting the structure of message packets transferred over the data router;
FIGS. 4A and 4B are block diagrams useful in understanding the structure and operation of the control network of the computer system of FIG. 1;
FIG. 5 is a diagram depicting the structure of message packets transferred over the control network;
FIGS. 6A through 6C are block diagrams useful in understanding the structure and operation of the diagnostic network of the computer system of FIG. 1;
FIG. 7 is a diagram depicting the structure of message packets transferred over the diagnostic network;
FIG. 8 is a general block diagram of a processing element in the computer system depicted in FIG. 1;
FIG. 9A-1 comprises a general block diagram of a data router interface circuit useful in interfacing the processing element depicted in FIG. 8 to the data router of the computer system depicted in FIG. 1, FIGS. 9A-2A and 9A-2B contain definitions of registers in the data router interface and FIGS. 9B-1 through 9D-7 comprise logic diagrams of the data router interface;
FIG. 10A comprises a general block diagram of a control network interface circuit useful in interfacing the processing element depicted in FIG. 8 to the control network of the computer system depicted in FIG. 1, FIGS. 10A-1 contains a definitions of a register in the control network interface and FIGS. 10B through 10G comprise logic diagrams of the control network interface;
FIG. 11A is a general block diagram of a data router node used in the data router described in connection with FIGS. 2A and 2B, and FIGS. 11B-1 through 11D comprise detailed block and logic diagrams of the data router node;
FIG. 12A is a general block diagram of a control network node used in the control network described in connection with FIGS. 4A and 4B, and FIGS. 12B-1 through 12D-1 comprise detailed block and logic diagrams of the control router node; and
FIG. 13A is a general block diagram of a diagnostic network node used in the diagnostic network described in connection with FIG. 6, and FIGS. 13B-1 through 13C comprise detailed block and logic diagrams of the diagnostic network node.
DETAILED DESCRIPTION OF AN ILLUSTRATIVE EMBODIMENT
I. General Description
A. General Description Of Computer System
FIG. 1 is a general block diagram of a massively parallel computer system 10 constructed in accordance with the invention. With reference to FIG. 1, system 10 includes a plurality of processing elements 11(0) through 11(N) (generally identified by reference numeral 11), scalar processors 12(0) through 12(M) (generally identified by reference numeral 12) and input/output processors 13(0) through 13(K) (generally identified by reference numeral 13). Input/output units (not shown), such as, for example, disk and tape storage units, video display devices, printers and so forth may be connected to the input/output processors to supply information, including data and program commands, for processing by the processing elements 11 and scalar processors 12 in the system, and may also receive processed data for storage, display and printing. The scalar processors 12 may also be connected to input/output units including, for example, video display terminals which permit one or more operators to generally control system 10.
The system 10 further includes a control network 14, a data router 15 and a diagnostic network 16. The control network 14 permits one or more scalar processors 12 to broadcast program commands to the processing elements 11. The processing elements 11 execute the commands generally concurrently. The control network 14 also permit the processing elements 11 to transfer status information to the scalar processors 12. The control network 14 is also used by the processing elements 11 to perform selected types of arithmetic operations, termed "scan" and "reduce" operations, as described below. The control network 14 may also be used to provide synchronization among the processing elements 11.
The data router 15 transfers data among the processing elements 11, scalar processors 12 and input/output processors 13. In particular, under control of the scalar processors 12, the input/output processors 13 retrieve data to be processed from the input/output units and distributes it to the respective scalar processors 12 and processing elements 11. During processing, the scalar processors 12 and processing elements 11 can transfer data among themselves over the data router 15. In addition, the processing elements 11 and scalar processors 12 can transfer processed data to the input/output processors 13. Under control of the scalar processors 12, the input/output processors 13 can direct the processed data that they receive from the data router 15 to particular ones of the input/output units for storage, display, printing, or the like.
The diagnostic network 16, under control of a diagnostic processor (not shown), facilitates testing of other portions of the system 10 to identify, locate and diagnose defects. The diagnostic processor may comprise one or more of the scalar processors 12. In addition, the diagnostic network 16 may be used to establish selected operating conditions in the other portions of the system 10 as described below.
The system 10 is synchronous, that is, all of its elements operate in accordance with a global SYS CLK system clock signal provided by a clock circuit 17.
One particular embodiment of system 10 may include hundreds or many thousands of processing elements 11 operating on a single problem in parallel under control of commands broadcast to them by the scalar processors 12. In that embodiment, the processing elements 11 operate in parallel on the same command on their individual sets of data, thereby forming a parallel computer system. In addition, the system 10 may be dynamically logically partitioned, as described below, into multiple subsystems which may concurrently operate on separate problems or separate parts of a single problem. In that case, each partition includes at least one scalar processor 12 and a plurality of processing elements 11.
B. General Description Of Communications Networks
1. Data Router 15
Before proceeding to a detailed description of the system 10 and its various components, it would be helpful to generally describe the structures of the control network 14 and data router 15. The data router 15 and control network 14 both transfer information in the form of message packets, which will be described in detail below in connection with FIGS. 3 and 5, respectively. FIGS. 2A and 2B depict the general structure of the data router 15 and FIGS. 4A and 4B depict the general structure of the control network 14.
With reference to FIG. 2A, the data router 15 is generally tree-structured, having a plurality of data router node groups 20(i,j) ("i" and "j" are integers) organized in a plurality of levels each identified by the index "i" in reference numeral
20(i,j). A data router node group 20(i,j) at each level "i" is connected to a selected number of data router node groups 20(i-1,j) in the next lower level "i-1" to form a tree. As will be described in detail below, the data router node groups 20(i,j) perform message switching operations to transfer data, in the form of data router message packets, among the processing elements 11, scalar processors 12 and input/output processors 13, which are collectively identified as leaves 21(0) through 21(N) (generally identified by reference numeral 21). Each data router node group 20(1,j) in the lowest level is connected to one or more leaves 21. In the reference numeral 20(i,j), the index (j) uniquely identifies each of the data router node groups
20(i,j) at each level "i."
In the data router 15 represented in FIG. 2A, the data router node group 20(M,0) at the highest level "M" is termed the "physical root" of the tree. At each level "i", each data router node group 20(i,j) is termed the "parent" of data router node groups 20(i-1,j) connected thereto, and each data router node group 20(i-1,j) is termed a "child" of the data router node group 20(i,j) to which it is connected. It will be appreciated that the data router node group 20(i,j) will also be a child of the data router node group 20(i+1,j) connected thereto. In one particular embodiment, each data router node group 20(i,j) in a particular level "i" is connected to four child data router node groups 20(i-1,j); in that embodiment, the "fan-out" of the tree, that is, the number of children connected to each parent, is four. It will be appreciated from the following that the fan-out need not be constant, but may vary from level to level and also among data router node groups 20(i,j) within the same level.
The structure of the data router 15 is further termed a "fat-tree", and will be particularly described in connection with FIG. 2B. With reference to FIG. 2B, at least some of the data router node groups 20(i,j) includes at least one, and typically two or more data router nodes 22(i,j,k), wherein "k" is an integer that uniquely identifies each data router node within a data router node group 20(i,j). Each data router node 22(i,j,k) in a data router node group 20(i,j) is connected to a plurality of data router nodes 22(i+1,j,k) in level "i+1," with the connections being established so that the data router nodes 22(i,j,k) in each data router node group 20(i,j) are connected to different ones of the data router nodes 22(i+1,j,k) in the data router node group 20(i,j) in level "i+1." For example, in data router node group 20(1,0), data router node 22(1,0,0) is connected to data router nodes 22(2,0,0) and 22(2,0,1) of data router node group 20(2,0), and data router node 22(1,0,1) is connected to data router nodes 22(2,0,2) and 22(2,0,3) of data router node group 20(2,0).
In addition, each data router node 22(i,j,k) in a parent data router node group 20(i,j) is connected to one data router node 22(i-1,j,k) in that parent's child data router node groups 20(i-1,j). Accordingly, as shown in FIG. 2B, data router node (2,0,0) in data router node group 20(2,1) is connected to one data router node 22(1,j,0), where "j" equals 0, 1, 2 and 3, in each of the data router node groups 20(1,0) through 21(1,3).
It will be appreciated that the collection of data router nodes 22(i,j,k) from each leaf 21 to and including the data router nodes 22(m,0,k) in the root data router node group 20(M,0) essentially forms an inverted tree. Each leaf 21 effectively comprises the root of one inverted tree and the data router nodes 22(M,0,k) of the root data router node group 20(M,0) form all of the leaves of all of the inverted trees defined by the collection of leaves 21. The number of data router nodes 22(i,j,k) in each data router node group 20(i,j) at a particular level "i" in the tree defining data router 15 will be determined by the fan-out at each level from level "1" to level "i" in the inverted tree. The fan-out at a particular level "i" is the number of data router nodes 22(i+1,j,k) at level "i+1" to which each data router node 22(i,j,k) at level "i" is connected. Thus, for example, since data router node 22(1,0,0) of data router node group 20(1,0) in level "1" is connected to two data router nodes
22(2,0,0) and 22(2,0,1) of data router node groups 20(2,0) in level "2," the fan-out from data router node 22(1,0,0) is two. In one particular embodiment, the fan-out from data router nodes 22(i,j,k) at a particular level "i" is the same for the entire level, but it may differ from level to level as described below.
As noted above, the data router 15 transfers message packets among the processing elements 11, scalar processors 12 and input/output processors 13, all of which are represented by leaves 21. Each connection shown in FIG. 2B between a leaf 21 and a data router node 22(1,j,k) of level 1, which is represented by a line therebetween, actually represents two unidirectional data paths, one for transferring a message packet in each direction. Thus, for example, the connection between leaf 21(0) and data router node 22(1,0,0) of data router node group 20(1,0) represents two data paths. One data path is used by the leaf 21(0) to transmit a message packet to the data router node 22(1,0,0) for delivery to another leaf 21(x). The other data path is used by the data router node 22(1,0,0) to deliver message packets originating at other leaves 21 destined for the leaf 21(0).
Similarly, each connection between a data router node 22(i,j,k) of a level "i" and a data router node 22(i+1,j,k) of a level "i+1," which is also represented in FIG. 2B by a line, represents two unidirectional data paths, one for transferring a message packet in each direction. Thus, for example, the connection between data router node 22(1,0,0) of data router node group 20(1,0) and data router node 22(2,0,0) represents two data paths, one used to transfer message packets from data router node
22(1,0,0) to data router node 22(2,0,0) and the other to transfer message packets in the opposite direction, that is, from data router node 22(2,0,0) to data router node 22(1,0,0).
Transfer of a message packet from one leaf 21(x) to another leaf 21(y) through the data router 15 message transfer proceeds in two general operations. First, the data router nodes 22(i,j,k) transfer the message packet first "up the tree," that is, to data router nodes in successively higher levels, until it reaches a selected maximum level determined in part by the separation between the source and destination leaves. After a message packet has reached the selected maximum level, the transfer continues "down the tree", during which the data router nodes 22(i,j,k) transfer the message packet to data router nodes at successively lower levels until it is delivered to the destination leaf 21(y). As will be clear from the detailed description of the structure and operation of a data router node 22(i,j,k) in FIGS. 11A through 11D below, the data router 15 can transfer a plurality of messages concurrently, any of the data router nodes 22(i,j,k) can direct messages up the tree and other messages down the tree at the same time.
Before proceeding further, it may be helpful to describe the structure of a message packet transferred over the data router 15. With reference to FIG. 3, a data router message packet 30 includes three general portions, including a message address portion 31, a message data portion 32, and a checksum portion 33, each comprising one or more "flits." In one embodiment, each flit comprises four bits, which are transferred in parallel over a data router connection, that is, between a leaf 21
and a data router node 22(i,j,k) or between two data router nodes 22(i,j,k).
The message data portion 32 includes several elements, including a length flit 34, a tag flit 35 and one or more data flits 36(0) through 36(N) (generally identified by reference numeral 36). The data flits 36 generally contain the actual message data being transferred over the data router 15, which may vary from packet to packet. The tag flit 35 contains control information which may be used by the destination leaf, identified herein by reference numeral 22(y), in processing the data. The contents of the length flit 34 are identify the number of flits in the message data portion 32, and may vary depending on the amount of data being transferred in a particular packet. In one particular embodiment, the contents of length flit 34
identify the number of thirty-two bit words in the data flits 36 of the message packet. In that embodiment, the number of data flits 36 in the message packet is eight times the value in the length flit
The checksum portion 33 contains a value which is used in detecting errors in packet transmission over the data router 15.
The data router 15 uses the contents of the message address portion 31 to determine the path to be traversed by the message packet 30 from the source leaf to the destination leaf. The message address portion 31 includes a header 40, which identifies the selected maximum level to which the message packet is to be transferred when going up the tree, and a down path identification portion 41 which identifies the path down the tree to the destination leaf 21(y) when going down the tree. When directing a message packet up the tree, a data router node 22(i,j,k) at level "i," randomly selects one of the data router nodes 22(i+1,j,k) connected thereto in level "i+1" in data router node group 20(i+1,j) to receive the message packet. Other than specifying the selected maximum height for the message packet, the packet does not otherwise specify the particular path it is to take up the tree.
The down path identification portion 41 of message packet 30 defines the path the packet is to take down the tree from the data router node group 20(i,j) at the selected maximum level to the destination leaf 21(y). The down path identification portion includes one or more down path identifier fields 42(1) through 42(M) (generally identified by reference numeral 42). The successive down path identifier fields 42, beginning with field 42(M), are used by the data router nodes 22(i,j,k) at successively lower levels as they direct the packet downwardly in the tree.
The down path identifier field 42(i) for level "i" identifies the child data router node group 20(i-1,j) to which the parent data router node group 20(i,j) that receives the packet at level "i" is to direct the message packet 30. It will be appreciated that the down path identifer fields 42 need not specifically identify one of the data router nodes 22(i-1,j,k) in the data router node group 20(i,j) at each level to which the message packet is to be directed, since the path down the tree is effectively a traversal of the inverted tree of which the destination leaf 21(y) is the root.
In one embodiment, in which each parent data router node group 20(i,j) is connected to four child data router node groups 20(i-1,j) or four leaves 21, each down path identifier field 42 comprises two bits that are binary encoded to identify one of the four children to which the message is to be directed. As indicated by FIG. 3, two fields 42 are packed into a single four-bit flit in the message packet 30. Since one down path identifier field 42 is used to at each level (i) in the downward traversal, the number of down path identifier fields 42 required to define the downward path corresponds to the selected maximum level in the path up the tree, which, in turn, corresponds to the contents of header 40. During the downward traversal mode, the data router nodes 22(i,j,k) through which a message packet 30 passes decrement the contents of the header 40 and, after both down path identifier fields 42 contained in a flit have been used, discard the flit. Thus, the length and content of a message packet 30 may change as it is being passed down the tree.
It will be appreciated that the addressing arrangement provided by the header 40 and down path identification portion 41 can be viewed as follows. The selected maximum height in header 40 effectively identifies the data router node group 20(i,j) which is the root of a sub-tree, preferably the smallest sub-tree, of the data router 15 that contains both the source leaf 21(x) and the destination leaf 21(y). On the other hand, the down path identification portion 41 details the exact path from that root to the destination leaf 21(y).
The provision of increasing numbers of data router nodes 22(i,j,k) in data router node groups 20(i,j) at higher levels in the data router 15, thereby resulting in a "fat-tree" design, provides several advantages. In a massively parallel computer SIMD system, processing elements 11 typically transfer messages during a message transfer operation, initiated by commands from the scalar processors 12. During a message transfer operation, a large number of processing elements 11 may transfer messages concurrently. If the data router 15 did not have increasing numbers of data router nodes 22(i,j,k) at higher levels to which the message packets 30 can be directed when going up the tree, the bandwidth of the data router 15, that is, the rate at which it can transfer message packets 30, would decrease at higher levels.
Since increasing numbers of data router nodes 22(i,j,k) are provided at higher levels in the "fat-tree" design, the reduction in bandwidth at higher levels can be minimized or controlled. As noted above, the fan-out of data router node groups
20(i,j), that is, the number of data router nodes 22(i+1,j,k) at level "i+1" connected to each data router node 22(i,j,k) at level "i" can vary from level to level, and can be selected to maintain a desired minimum bandwidth between the respective levels "i" and "i+1." Alternatively, the fan-outs from each level to the next higher level can be selected so that the entire data router 15 has a selected minimum bandwidth.
Further, as noted above, each data router node 22(i,j,k) randomly selects the data router node 22(i+1,j,k) in the next higher level to which it directs a message packet 30 in the path up the tree. Accordingly, the message packets are randomly distributed through the higher levels of the tree, which minimizes the likelihood of bottlenecks and maximizes the bandwidth in the higher levels.
As shown in FIGS. 2A and 2B, each data router node group 20(i,j), and in particular each data router node 22(i,j,k), in the data router 15 receives an AFD(i,j) all-fall-down (i,j) signal. The AFD(i,j) all-fall-down (i,j) signal is provided by the control network 14, as will be described below in connection with FIGS. 4A and 4B, under control of the scalar processors 12 to initiate a context switch operation. The AFD(i,j) all-fall-down (i,j) signal, when asserted, enables the data router 15
to enter an all-fall-down mode, in which it quickly empties itself of message packets. In response to the AFD(i,j) all-fall-down (i,j) signal, the data router 15 directs all message packets 30 directly down the tree to the leaves 21, where they are stored until the context in which the message packets were generated is restored. At that point, the leaves 21 which receive such messages can transmit them over the data router 15, which will deliver them to the intended destinations.
In contrast to normal operation described above, in which the contents of the header 40 are decremented and flits containing down path identifier fields 42 discarded as the message packet 30 is directed down the tree, when the AFD(i,j) all-fall-down (i,j) signal is asserted the contents of the header 40 are not decremented and no changes are made to the flits containing the down path identifier fields 42. When the context is restored and the leaves 21 return the message packets to the data router 15, they will be delivered to the proper destination leaves. This can be seen from the following explanation.
In the following explanation, reference numerals 21(x) and 21(y) will refer to the original source and destination leaves, respectively, for a message packet 30 and reference numeral 21(x') will refer to the intermediate storage leaf which receives and stores the message packet 30 while the context in which the data router message packet 30 was generated is being switched out. First, for those message packets that are being transferred up the tree or that have reached the selected maximum height when the AFD(i,j) all-fall-down (i,j) signal is asserted, the contents of the header 40 and down path identification portion 41 are the same as when they were originally transmitted by the source leaf 21(x). Since the intermediate storage leaf
21(x') receives the message packet 30 it must be part of a sub-tree of the data router 15 that includes both the source leaf 21(x) and the destination leaf 21(y). Further, the sub-tree has the same root data router node group 20(i,j) that the message packet 30 would have reached had the AFD(i,j) all-fall-down (i,j) signal not been asserted. Accordingly, when the intermediate storage leaf 21(x') transmits the message packet over the data router 15, the packet will go up the tree and reach the same data router node group 20(i,j) that it would have reached if the AFD(i,j) all-fall-down (i,j) signal had not been asserted, and from there will follow the same downward path, defined by the down path identification portion 41, that it would have taken.
On the other hand, if a message packet is being transferred down the tree when the AFD(i,j) all-fall-down (i,j) signal is asserted, prior to the signal's assertion the contents of the header field 40 are decremented as the message packet is passed from level to level. Accordingly, it will be appreciated that, when the message packet 30 is transmitted by the intermediate storage leaf 21(x'), in its path up the tree it will go only to a data router node group 20(i,j) at the level indicated in the header field 40, which, in turn, corresponds to the data router node group 20(i,j) which controlled the direction of transfer of the message packet 30 when the AFD(i,j) all-fall-down (i,j) signal signal was asserted. It will be appreciated that the data router node group 20(i,j) that the message packet 30 reaches may not be the root of a sub-tree that includes the source leaf 21(x). However, it will be the root of a sub-tree that includes both the intermediate storage leaf 21(x'), since the message packet 30 was transferred from that data router node group 20(i,j) to the intermediate storage leaf 21(x'), and the destination leaf 21(y), since the message packet 30 could have been transferred from that data router node group 20(i,j) to the destination leaf had the AFD all-fall-down (i,j) signal not been asserted.
As will be described in further detail below, each leaf 21 maintains a message counter that it increments when it transmits a message packet over the data router 15, and that it decrements when it receives a message packet from the data router
15. As noted above, the control network 14 performs selected arithmetic operations, whose results can be provided to the processing elements 11 and scalar processors 12. By enabling the control network 14 to perform selected arithmetic operations using the values of the message counters, the results can identify when all of the message packets that were transmitted over the data router 15 have been received by the leaves 21, thereby indicating that the data router 15 is empty. This can be used to indicate that a message transfer operation has been completed, or that the router 15 is empty as a result of the assertion AFD(i,j) all-fall-down (i,j) signal so that a context switch can occur.
2. Control Network 14
As noted above, the control network 14 transfers program commands from the scalar processors 12 to the processing elements 11 and returns status information to the scalar processors 12, and in addition performs selected types of arithmetic operations. The control network 14 will be generally described in connection with block diagrams depicted in FIGS. 4A and 4B, and with FIG. 5, which depicts the structure of a control network message packet.
With reference first to FIGS. 4A and 4B, the control network 14, like the data router 15, is generally tree-structured, having a plurality of control network node groups 50(i,j) ("i" and "j" are integers organized in a plurality of levels each identified by the index "i" in reference numeral 50(i,j). In the reference numeral 50(i,j), the index (j) distinguishes the diverse control network node group 50(i,j) at each level "i." The tree structure of the control network 14 is generally similar to that of the data router 15. In particular, each control network node group 50(i,j) corresponds to a data router node group 20(i,j) having the same values for indices "i" and "j", and connections among control network node groups 50(i,j) follow the same pattern as connections among data router node groups 20(i,j). Each control network node group 50(1,j) in the lowest level is connected to one or more leaves 21, in the same pattern as the connections in the data router 15.
Similar terminology will be used in describing the control network 14 as was used in describing the data router 15 above. In particular, in the control network 15 represented in FIG. 2A, the control network node group 50(M,0) at the highest level "M" is termed the "physical root" of the tree. At each level "i", each control network node group 50(i,j) is termed the "parent" of control network node group 50(i-1,j) connected thereto, and each control network node group 50(i-1,j) is termed a "child" of the control network node group 50(i,j) to which it is connected. The control network node group 50(i,j) will also be a child of the control network node group 50(i+1,j) connected thereto. In one particular embodiment, each control network node group 50(i,j) in a particular level "i" is connected to four child control network node groups 50(i-1,j), in which case the "fan-out" of the tree, that is, the number of children connected to each parent, is four. As indicated above in connection with the data router 15, the fan-out need not be constant, but may vary from level to level and also among control network node groups 50(i,j) within the same level.
The structure of a control network node group 50(i,j), which is shown on FIG. 4B, differs from the structure of a data router node group 20(i,j). With reference to FIG. 4B, a control network node group 50(i,j) includes three control network nodes 51(i,j,l), where "i" can have the values "P," "C.sub.1 " or "C.sub.2." Within a control network node group 50(i,j), the control network nodes are connected so that control network node 51(i,j,P) is parent of child control network nodes
51(i,j,C.sub.1) and 51(i,j,C.sub.2). It will be appreciated that parent control network node 51(i,j,P) of control network node group 50(i,j) is itself a child of a control network node 51(i+1,j,C.sub.1) or control network node 51(i+1,j,C.sub.2) of a control network node group 50(i,j) of the next higher level "i+1." Similarly, each child control network node 51(i,j,C) is a parent of either a leaf 21 or a control network node 51(i-1,j,P) of the next lower level "i-1."
It should be noted that, in FIGS. 4A and 4B, the indices "j" for control network nodes 51(i,j,l) in each level increase from right to left. In the following, for each parent control network node 51(i+1,j,l), the child control network node
51(i,j,l) connected thereto with the lower index "j" will be termed the "left" child, and the control network node 51(i,j,l) with the higher index "j" will be termed the "right" child.
The control network node group 50(i,j) thus contains two sub-levels of control network nodes 51(i,j,l), one defined by parent control network node 51(i,j,P), and the other defined by child control network nodes 51(i,j,C.sub.1) and
51(i,j,C.sub.2). This enables the control network node groups 50(i,j) to have the same connection pattern within the control network 14 as the corresponding data router node groups 20(i,j) within the data router 15, while at the same time providing a two-child/one-parent connection for the control network nodes 51(i,j,l) which simplifies performance of the arithmetic operations as described below.
As in the data router 15, each connection between control network nodes 51(i,j,l) depicted in FIGS. 4A and 4B represents two unidirectional data paths, which transfer message packets in opposite directions between the respective nodes.
As noted above, the scalar processors 12 use the control network 14 to broadcast commands to the processing elements 11. In this operation, a scalar processor 12 transmits a message packet, which will be described below in detail in connection with FIG. 5, to the control network node 51(1,j,C) to which it is connected. The control network nodes transfer the message packet up the tree to the root, which then transmits the message packet down the tree to its children. As each control network node receives such a downwardly-going message packet, it transmits it to all of its children until the packet is delivered to the leaves 21. The control network 14 effectively broadcasts the message packet, and thus the command, to all of the processing elements 11. It will be appreciated that the message packet will also be received at leaves 21 comprising scalar processors 12 and input/output processors 13, but these processors can be configured to ignore the packet.
As also noted above, the system 10 can be partitioned so as to effectively constitute multiple independently-operable systems, each including at least one scalar processor 12 and one or more processing elements 11. In partitioning the system 10, the scalar processor 12 establishes a logical root in a control network node 51(i,j,l) in the control network 14 which differs from the control network node 51(M,0,P) which constitutes the physical root. The logical root effectively comprises the root of a sub-tree whose leaves include the scalar processor 12 and one or more other leaves 21. If a control network node 51(i,j,l) becomes a logical root, while it is a logical root its parent node 51(i+1,j,l) in the control network 14 does not not transmit downwardly-going message packets thereto.
Each control network node 51(i,j,l) includes a root flag 1407, which is described in detail in connection with FIGS. 12A below. When the root flag 1407 is set, the control network node 51(i,j,l) is a root of the control network 15. If the control network node 51(i,j,l) is to be a physical root, the root flag 1407 may be set by appropriate conditioning of an input signal that controls the control network node. To establish a control network node 51(i,j,l) as a logical root, the scalar processor 12 transmits a control network message packet therefor up the tree comprising control network 14. The message packet includes a height value identifying the level and sub-level at which the logical root is to be established. Each control network node 51(i,j,l) which receives the message packet determines whether the height value corresponds to its level and sub-level, and if not passes the message packet to the next control network notd 51(i,j,l) up the tree. When a control network node
51(i,j,l) determines that the height value in the message packet corresponds to its level and sub-level, it sets its root flag 1407 and begins operating as a logical root as described above. In connection with that, the control network node 51(i,j,l) notifies its parent control network node 51(i,j,l) that it is a logical root.
It will be appreciated that a control network node 51(i,j,l) operating as a logical root of a partition may receive a message packet that indicates that a control network node 51(i+x,j,m) at a higher level or sub-level is to operate as a logical root. A scalar processor 11 may issue such a message to, for example, increase the number of processing elements 11 or scalar processors 12 in the partition. In that event, the control network node 51(i,j,l) stops operating as a logical root.
To simplify the following description, the term "root node," which may appear with or without the reference numeral 51(i,j,l), will be used to collectively refer to the physical root control network node 51(M,0,P), in situations in which the control network 14 is not partitioned, and to a control network node 51(i,j,l) comprising a logical root in situations in which the control network 14 is partitioned. If the control network 14 is partitioned, the logical root node functions for the other control network nodes 51(i,j,l) in the partition substantially in the same manner as the physical control network node 51(M,0,P) functions for the control network nodes 51(i,j,l) in an unpartitioned control network 14. Otherwise stated, the physical root node can be considered as the logical root node of a partition comprising the entire system 10.
As noted above, the control network 14 also performs several types of arithmetic operations in response to control network message packets therefor, including scan and reduce operations. Scan operations are generally described in Guy E. Blelloch, Scan Primitives and Parallel Vector Models, (Ph.D. Dissertation, Massachusetts Institute of Technology: 1988). In a scan operation initiated by processing elements 11 that are logically arranged in a particular ordering, such as with increasing indices "i" in reference numeral 11(i) (with indices increasing, for example, from right to left as shown in FIG. 4B), the scan operation for a particular arithmetic operator "*" on items of data "D(i)" maintained by the processing element
11(i) produces at each of the successive processing elements 11 in the ordering the result "R(i)":
In the scan operation, the arithmetic operator may constitute a number of types of operators, including, for example, signed or unsigned addition, OR, XOR (exclusive-OR) and MAX, the latter referencing determination of a maximum of a set of values.
To accommodate scan operations, each processing element 11 includes an up data processor 1421, a down data processor 1652, and a scan buffer 1410, all of which will be described below in connection with FIGS. 12A through 12D-1. To initiate a scan operation, the processing elements 11 transfer control network message packets therefor over the control network 14. The control network message packet provided by each processing element 11(i) includes that processing element's data item D(i).
With reference to FIG. 4B, each control network node 51(1,j,C.sub.1) and 51(1,j,C.sub.2), on receiving a message packet from the processing elements connected thereto, loads the data from the left processing element, that is, the processing element 11(i) with the index "i" being zero or an even number, into its scan buffer 1410. In addition, the up data processor 1421 of each control network node 51(1,j,C) performs the arithmetic operation on the data to generate a result that corresponds to the combination of the data received from the two processing elements 11 connected thereto, combined according to the arithmetic operator being used in the scan operation. The control network node 51(1,j,C) uses the value generated by the up data processor 1421 as data in a message packet, which it transmits to its parent.
Each control network node 51(i,j,l), except for the root node, on receiving message packets from both its left and right children, performs the same series of operations. In particular, each control network node 51(i,j,l) at each sub-level up to the root node:
(a) stores in its scan buffer 1410 the data in the control network message packet that it receives from its left child control network node 51(i-1,j,l); it will be appreciated that this value corresponds to the combination of the data from the processing elements in the sub-tree of the control network 14 whose root is the left child control network node 51(i-1,j,l), combined according to the arithmetic operator being used in the scan operation, and
(b) performs, using its up data processor 1421 the operation, defined by the arithmetic operator being used in the scan operation, in connection with data from both of its children to generate a value which it transmits in a message to its parent. It will be appreciated that this value corresponds to the combination of the data from the processing elements in both sub-trees of the control network 14 whose roots are both child control network network nodes 51(i-1,j,l) connected thereto.
Thus, at the point at which all control network message packets have propagated up the control network tree, the scan buffer 1410 at each control network node 51(i,j,l), other than the root node, contains a value corresponding to the data provided by the processing elements 11 in the sub-tree whose root is the node's left child, processed according to the scan operaiton's arithmetic operator.
The root node also receives message packets from both of its children containing intermediate results for a scan operation, it transmits message packets down the tree. The root node receives, from each child, a value corresponding to the data provided by the processing elements 11 in the sub-tree whose root is the respective child, processed according to the scan operation's arithmetic operator. It will be appreciated that the value received from the left child control network node corresponds to the combination of the data from the processing elements in the sub-tree of the control network 14 whose root is that left child control network node, and the value received from the right control network node corresponds to the combination of the data from the processing elements in the sub-tree whose root is the right control network node, in both cases the data being combined according to the scan operation's arithmetic operator.
To its left child, the root node transmits a message packet whose data has the value zero. To its right child, the root node transmits a packet whose data has the value received from the left child. As noted above, that value corresponds to the combination of the data from the processing elements in the sub-tree of the control network 14 whose root is that left child control network node, combined according to the scan operation's arithmetic operator.
When each control network node 51(i,j,l) below the root node receives a control network message packet from its parent, it
(a) uses the down data processor 1652 to generate a value corresponding to the value of the data received from the parent combined with the intermediate result stored in the nodes' scan buffer 1410 according to the arithmetic operator used in the particular scan operation, which it transmits in a control network message packet to its right child; it will be appreciated that this value corresponds to the combination of the data from the processing elements 11 in all sub-trees of the control network 14 up to the one whose root is the left child of the control network node, combined according to the arithmetic operator being used in the scan operation, and
(b) transmits a control network message packet to its left child whose data has the same value as that received from the parent; it will be appreciated that this value corresponds to the combination of the data from the processing elements in all sub-trees of the control network 14 up to the one whose root is the left child of the parent of the control network node, combined according to the arithmetic operator being used in the scan operation.
Thus, the control network message packets transmitted by the control network nodes 51(i,j,l) down the tree will propagate the zero value down the left side to the left-most processing element 11(0). The next processing element 11(1) will receive the combination, as defined by the arithmetic operator, of the zero value propagated from the root node and the value stored in the scan buffer 1410 of the control network node 51(1,0,C.sub.1), which corresponds to the value of the data transmitted by the processing element 11(0).
The next processing element 11(2) will receive, as the left child connected to the control network node 51(1,0,C.sub.2) the value stored in the scan buffer 1410 of the control network node 51(1,0,P), which, as noted above, corresponds to the combination, as defined by the scan operation's arithmetic operator, of the data from the processing elements 11(0) and 11(1). The processing element 11(3) will receive, as the right child, the combination of that value and the value in the scan buffer
1410 of control network node 51(1,0,C.sub.2), which, as noted above, corresponds to the data provided by the processing element 11(2). Accordingly, the processing element 11(3) will receive the combination, as defined by the scan operation's arithmetic operator, of the data from processing elements 11(0), 11(1) and 11(2).
It will be appreciated that the control network nodes 51 will combine the data provided to the successive processing elements 11 in the sub-tree of the root node's left child similarly. Accordingly, each processing element 11(i) in that sub-tree will receives a value corresponding to data from processing elements 11(i-1) through 11(0) combined according to the arithmetic operator of the particular scan operation.
The control network nodes 51 in the sub-tree of the root node's right child also combine the data in the control network message packet provided by their respective parents with the data in their respective scan buffer 1410 in a similar manner. As noted above, the root node transmits to its right child a control network message packet including a value corresponding to the combination of the data provided by the processing elements 11 in the sub-tree defined by the root node's left child, combined according to the scan operation's arithmetic operator. It will be appreciated that the control network message packets transmitted by the control network nodes 51(i,j,l) in that sub-tree will propagate that value down the left side of the sub-tree to the left-most processing element 11(i), so that that processing element 11(i) also receives a value corresponding to data from processing elements 11(i-1) through 11(0) combined according to the arithmetic operator of the particular scan operation. Since the control network nodes 51(i,j,l) in that sub-tree operate in a manner similar to those in the sub-tree defined by the root node's left child, each processing element 11(i) will receive a value corresponding to data from processing elements 11(i-1) through 11(0) combined according to the arithmetic operator of the particular scan operation.
The control network 14 can also perform a backward scan operation, in which the scan direction is from right to left, that is, toward processing elements 11(i) of lower indices. In that case, each processing element 11(i) will receive a value corresponding to data from processing elements 11(i+1) through 11(N) (where "N" is the highest index) combined according to the arithmetic operator of the particular scan operation. In that operation, each control network node 51(i,j,l) interchanges control network message packets that it receives at its input terminals from its children, and also the control network message packet that it transmits through the outputs to its children, and otherwise operates similar to that above.
This effectively interchanges the left and right children at each level, so that if the control network nodes 51 otherwise operate as described above, the scan direction will be reversed.
In addition, the control network 14 can perform a segmented scan operation, in which the processing elements 11 of a partition may be divided into two or more segments. In each case, the first processing element 11(i) in the first segment is the first processing element 11(i) in the partition. The first processing element 11(i) in each succeeding segment transmits a control network message packet in which a segment bit is set. Each control network node 51(i,j,l) also includes a segment flag
1561 (FIG. 12B-1G). Each control network node 51(i,j,l) operates as described above, except that in transmitting control network message packets up the control network tree:
(a) if it receives a control network message packet from its right child in which the segment bit is set, it transmits in a control network message packet to its parent data corresponding only to the data in the control network message packet received from the right child; and
(b) if it receives a control network message packet from either child in which the segment bit is set, it sets its segment flag 1561, and sets the segment bit in the control network message packet it that transmits to its parent.
In either case, the control network node 51 buffers the data received from the left child control network node in its scan buffer 1410, in the same manner as in an unsegmented scan operation as described above.
In connection with control network message packets that are transmitted down the control network tree, each control network node 51, if its segment flag 1561 is set, transmits to its right child a control network message packet whose data corresponds to the value stored in the scan buffer 1410. The control network node 51 transmits to it left child a control network message packet whose data corresponds to the data from its parent, in the same manner as in an unsegmented scan operation as described above.
It will be appreciated that the first processing element 11(i) which is the first in each segment, other than the processing element 11(i) comprising the first in the partition, will not receive the value zero, as required in Eqn. 1 above. However, since those processing elements 11, in initiating the scan operation, transmitted control network message packets whose segment bits were set, they are aware that they are the first processing elements 11(i) in their respective segments, and can interpret the value received as zero.
In a reduce operation for a particular arithmetic operator "*" on items of data "D(i)" maintained by the processing elements 11(i) produces at all of the processing elements 11 the same result "R":
In a reduce operation, the arithmetic operator may constitute a number of types of operators, including, for example, signed or unsigned addition, OR, XOR and determination of a maximum.
In performing a reduce operation, the processing elements 11 transfer message packets therefor over the control network 14. The message packet provided by each processing element 11(i) includes that processing element's data item D(i). With reference to FIG. 4B, each control network node 51(1,j,C), on receiving a message packet from the processing elements connected thereto, performs the operation specified by the mathematical operator to generate an intermediate result, which it transmits in a message packet to its parent node 51(1,j,P).
This operation is repeated at successive parent nodes at higher levels in the tree comprising control network 14 until the message packets reach the root node. When the root node receives message packets from both of its children, it performs the operation specified by the mathematical operator on the data from its two children to generate a result value. The root node generates message packets whose data is the result value and transmits them to both of its children. Each of the control network nodes 51(i,j,l) that receives such a message packet repeats it to both of its children, until they reach the processing elements 11, thereby broadcasting the result to all of the processing elements 11.
As noted above, the leaves 21(i) may comprise a processing element 11(i), a scalar processor 12(i) or an input/output processor 13(i). In the above description, only the processing elements 11(i) have been indicated as engaging in scan operations and reduce operations. It will be appreciated, however, that scalar processors 12(i) and input/output processors 13(i) may, along with processing elements 11(i), engage in such operations. Alternatively, the scalar processors 12(i) and input/output processors 13(i) may abstain from the scan and reduce operations. They may accomplish this either by transmitting control network message packets which contain data having a value of zero, or by transmitting a special type of control network message packet, described below as an abstain type, which the control network nodes 51(i,j,l) may treat as containing data having the value zero.
As noted above, each processing element 11 maintains a message counter which counts data router message packets it transmits and receives over the data router 15. The processing element 11 increments the message counter when it transmits a data router message packet over the data router 15 and decrements the counter when it receives a data router message packet over the data router 15 during a message transfer operation. It will be appreciated that during a message transfer operation some processing elements 11 may transmit more data router message packets than they receive, and thus at the end of the message transfer operation the message counter will have a positive value. On the other hand, some processing elements 11 may receive more data router message packets than they transmit during the message transfer operation, in which case the message counter will have a negative value at the end of the messae transfer operation.
The processing elements 11 use the control network 14, in particular enabling a reduce operation, to determine when the data router 15 is empty, that is, when the data router 15 has delivered all data router message packets to processing elements
11. More specifically, each processing element 11, after it transmits all of its data router message packets for the message transfer operation, begins transmitting control network message packets specifying a reduce operation, with signed addition as the arithmetic operator. The data in each control network message packet is the current value of the processing element's message counter. The processing elements 11 iteratively transmit such control network message packets until they receive a control network message packet whose data has the result value of zero. It will be appreciated that, at that point the processing elements 11 have collectively received as many data router message packets as they transmitted during the message transfer operation, and so the data router 15 will be empty of data router message packets.
FIG. 5 depicts the structure of a control network message packet 60 that is transferred over the control network 14. With reference to FIG. 5, the control network message packet 60 has a fixed length of thirteen "flicks." In one embodiment, each flick has five bits, with the first twelve flicks, identified as FLICK 0 through FLICK 11, including four packet information bits (labelled "PKT INFO" in FIG. 5) and one tag bit. The packet information portion of the first twelve flicks comprise a packet header portion 61 and a packet data portion 62. The thirteenth flick, namely FLICK 12 identified by reference numeral 63, contains a checksum used in error detection. The checksum is generated across all five bits of the successive flicks in the packet 60. The tag bits contain control information as described below.
The packet header portion 61 includes four fields, including a message type field 64, a packet type field 65, a combine function type field 66 and a pattern field 67(0) and 67(1) (collectively identified by reference numeral 67). The packet data portion 62 includes eight four-bit data nibbles 70(0) through 70(7) (generally identified by reference numeral 70) and a four-bit nibble 71 containing global information.
The message type field 64 identifies the type of message contained in the message packet 60. In one embodiment, a packet 60 can contain one of five different types of messages, including an SS (single source) message, an MS (multiple source) message, an ABS abstain message, an IDLE message and an NPAC nil packet message. When a scalar processor 12 broadcasts a command to the processing elements 11 for processing thereby, it uses a single source message packet to carry the command. In addition, a scalar processor 12 may also use single source message packets to broadcast other types of control information to one or more of the processing elements 11 or input/output processors 13, or to another scalar processor 12.
A single source message packet is passed by each control network node 51(i,j,l) which receives it up the control network tree from node to node until it reaches the root node. The root node transmits the single source message packet down the tree to its children. Each control network node 51(i,j,l), which receives a single source message packet from its parent transmits it down the tree to both its children, effectively broadcasting the packet to all of the processing elements 11 in the partition.
Multiple source messages are used by the processing elements 11 to initiate scan and reduce operations as described above. Idle message packets are transmitted when a leaf 21 or control network node 51(i,j,l) has no other types of message packets to transit. A leaf 21 transmits abstain message packets to indicate that it is not participating in a scan or reduce operation. If a control network node 51(i,j,l) receives idle or abstain message packets from both of its children, it may transit a message packet of the same type to its parent. If a control network node 51(i,j,l) receives a multiple source message packet from one of its children and an abstain message packet from its other child, it does not thereafter wait for a multiple source message packet therefrom to use in the arithmetic operation specified in the multiple source message packet that it receives from the one child. Instead, the control network node 51(i,j,l) forwards the multiple source message packet that it receives to its parent, and, if the abstain message packet came from its right child, stores the data from the message packet in its scan buffer 1410.
A message packet of the nil packet type, unlike message packets of other message types, is only one flick in length. In particular, a nil packet message comprises only the message the flick 64, the contents indicating that the message packet is of the nil packet type. A control network node 51(i,j,l) continually transits messages of the nil packet type to its parent while it [that is, the control network node 51(i,j,l)] is a logical root of a partition, and the parent transits message packets of the same type to that child. If the parent receives a multiple source message packet from its other child, it forwards it to its parent.
The packet type field 65, combine function type field 66 and a pattern field 67 contain further information about the information in the control network message packet 60.
In one particular embodiment, the processing elements 11 can operate in two operational modes, identified herein as "supervisor" and "user." If the message type field 64 indicates that the control network message packet is a single source message packet, the packet type field 65 can identify a message packet as a broadcast supervisor packet or a broadcast user packet. If the packet type field 65 indicates that the control network message packet is a broadcast supervisor packet, it contains a command for execution by the processing elements 11 in the supervisor mode. On the other hand, if the packet type field indicates that the control network message packet contains a broadcast user packet, it contains a command for execution by the processing elements 11 in the user mode.
In addition, if the message type field 64 indicates that the control network message packet is a single source message packet, the packet type field 65 may indicate that the control network message packet is an interrupt packet. The interrupt packet may be used to initiate operations at particular ones of the processing elements 11. The operations and the particular ones of the processing elements 11 to perform them may be identified in the packet data portion 62.
Further, if the message type field 64 indicates that the control network message packet is a single source message packet, the packet type field 65 may indicate that the control network message packet contains configuration information which enables the establishment or elimination of a logical root at a particular control network node 51(i,j,l). If the packet type field identifies the message packet as containing configuration information, the first two flicks 70(0) and 70(1) of packet data portion 62 contain data specifying the level and sub-level in control network 14 at which the logical root is to be established. The control network node 51(i,j,l) at that level and sub-level which receives the configuration message packet establishes itself as the logical root.
If the message type field 64 identifies the messge packet as a multiple source message packet, the packet type field 65 identifies the operation to be performed as a scan involving data in a single packet or a plurality of packets, or to perform an operation to determine whether the data router 15 is empty. The data to be used is contained in data fields 70(0) through 70(7) (generally identified by reference numeral 70) of the packet data portion 62. If the packet type field 65 identifies a scan operation involving data in a single packet, the scan operation is limited to a data value having a single thirty-two bit word. However, if the packet type field identifies a scan operation involving data in a plurality of successively-transmitted packet, which will be identified as a "multi-word scan," the scan operation involves data values of more than thirty-two bits, which are contained in control network message packets 60 successively transmitted by the processing elements 11. In either case, if the packet type field 65 identifies the operation as a scan operation, the pattern field 67 further identifies it as either a scan forward or scan backward operation or a reduce operation, and combine function type field 66 identifies the particular arithmetic operator to be used in the operation.
As has been described above, control network message packets of the multiple source type may be used, with arithmetic operations, to determine whether the data router 15 is empty, using the contents of message counters maintained by the processing elements 11 as data. Similar control network message packets may also be used to perform other control operations using, for example, bits of the global information field 71. For example, the scalar processors 12 may need to be notified when all of the processing elements 11 have finished executing a particular command before they transmit a subsequent command. In that case, each processing element when it has finished executing a command, may tranmsit a control network message packet 60, of the multiple source type, indicating a reduce operation using the OR operator, with a particular bit in the global information field 71 being set. It will be appreciated that, after all of the processing elements 11 have executed the istruction and transmitted corresponding packets, the root node will as the result of the reduce operation, broadcast control network message packets down the control network tree in which the bit will be set. When the scalar processor 12 receives the resulting control network message packet from the control network node 51(1,j,l) connected thereto, it can determine the condition of the bit and determine therefrom that the command has been executed.
Bits of the global information field 71 may also be used by the processing elements 11. In processing certain commands from the scalar processors 12, the processing elements 11 sometimes may reach a point in processing a command at which they have to verify that all of the processing elements have reached the same point before they proceed. To accomplish that, when each processing element has reached the particular processing point it may transmit a control network message packet as described above, that is, of the multiple source type, indicating a reduce operation using the OR operator, with a particular bit in the global information field 71 being set. When the processing elements 11 receive the resulting control network message packet from their respective control network nodes 51(1,j,l) connected thereto, they can determine therefrom that all of the processing elements 11 have reached the required point in their processing of the command, and continue processing.
The tag bits of the successive flicks in a control network message packet 60 contain various types of control and status information. Several of the tag bits control the flow of control network message packets through the control network 14. Five tag bits comprise scan flow bits, generally identified by reference numerals 72(i) ("i" is an integer from "1" through "5"). The control network nodes 51(i,j,l), processing elements 11 and scalar processors 12, as well as any input/output processors 13 which transmit and receive control network message packets over the control network 14, use the scan flow bits to control the transfer of message packets between directly-connected components in the control network 14.
Two tag bits, including a broadcast user flow bit 73 and a broadcast supervisor flow bit 74 are conditioned by the processing elements 11, scalar processors 12 and those input/output processors 13 which transmit control network message packets over the control network 14, to indicate whether they are able to receive control network message packets containing control information for the supervisor and user modes respectively. Each processing element 11, scalar processor 12 and input/output processor 13, respectively, conditions bits 73 and 74 in any control network message packets that it transmits to indicate whether it can receive single source message packets having packet types, as indicated in packet type field 65, of broadcast supervisor type and broadcast user type, respectively.
Another tag bit that controls the control network 14 is a flush bit 75. When a control network node 51(i,j,l) receives a control network message packet in which the flush bit 75 is set, it clears its scan buffer. This may be used to clear intermediate results of a scan or reduce operation from the control network 14 during a context switch.
A soft error bit 76 is used by a control network node 51(i,j,l) to indicate that it has detected a software error from the contents of a control network message packet 60. For example, if the control network node 51(i,j,l) determines that the contents of the packet type field 65 do not identify one of the established packet types for the message type identified in message type field 65, the node may set the soft error bit 76.
As described above, the control network 14 performs segmented scan operations using data in message packets transmitted by the processing elements 11. A segment bit 77, when set, indicates that the control network message packet 60 contains data for the upper end of a segment. A scan overflow bit 80, when set, indicates that the result of the arithmetic operation is larger than can be accommodated in the data fields 70 of the control network message packet 60. The scan overflow bit 80 may also be used to indicate overflow during a reduce operation. If the scan overflow bit 80 is set, the operation can be repeated in a multi-word operation.
Finally, a control network message packet 60 includes an AFD all-fall-down bit 81. If a control network node 51(i,j,l) receives a control network message packet 60 in which the AFD all-fall-down bit 81 is set, it asserts an ADF(i,j,l) all-fall-down signal. The AFD(i,j,l) all-fall-down (i,j) signal from each parent control network node 51(i,j,P) is connected to the data router nodes 22(i,j,k) of the data router node group 20(i,j) having the same indices "i" and "j."
3. Diagnostic Network 16
As noted above, the diagnostic network 16, under control of a diagnostic processor, facilitates testing of other portions of the system 10 to identify, locate and diagnose defects. In addition, the diagnostic network 16 may be used to establish selected operating conditions in the other portions of the system 10 as described below. The general structure of the diagnostic network 16, and its connection to the other elements of the system 10, will be described in connection with FIGS. 6A through
6C. Messages transferred over the diagnostic network 16 will be described in connection with FIG. 7.
With reference to FIGS. 6A through 6C, the diagnostic network 16 includes a plurality of diagnostic network node generally identified by reference numeral 100(h,p,r-l), where "h" and "p" comprise integers representing a height value and a pod-type value, and "r-l" comprises one or more integers which together comprise a root-leaf value. The various diagnostic network nodes 100(h,p,r-l) are connected in a tree-type structure which actually forms a tree of trees as shown in the Figs. In particular, the diagnostic network 16 includes a high-order tree identified as a height-decoding tree, as represented by the diagnostic network nodes 100(h,p,r-l) in the left-most columns of the respective FIGS. 6A through 6C. Each diagnostic network node 100(h,p,r-l) in the height decoding tree is identified by a reference numeral 100(h,0,0 . . . 0), where the value of "h" is associated with a level in the data router 15 and control network 14. A diagnostic processor 101 is connected to the diagnostic network node 100(h,0,0 . . . 0) at the highest level of the height decoding tree.
The height decoding tree is essentially a linear tree, that is, there is no fan-out from level to level in the height decoding tree. The height decoding tree essentially forms the backbone of other lower-level trees in the diagnostic network 16, including a pod-type decoding tree, represented by diagnostic network nodes 100(h,p,r-l) in the middle column of FIGS. 6A through 6C, and a root-leaf decoding tree represented by diagnostic network node 100(h,p,r-l) in the right-hand column of FIG. 6A through 6C. In particular, depending from each diagnostic network node 100(h,0,0 . . . 0) in the height decoding tree is a diagnostic network node 100(h,1,0 . . . 0), which comprises the pod-type decoding tree. Although only one diagnostic network node 100(h,1,0 . . . 0) is shown in the pod-type decoding tree at each level, the diagnostic network 16 may include multiple decoding nodes connected in a tree structure. In that case, the diagnostic network node 100(h,1,0 . . . 0) will comprise the root of the pod-type decoding tree, and other diagnostic network nodes 100(h,p,0 . . . 0) will comprise intermediate nodes and leaves of the pod-type decoding tree.
In addition, depending from diagnostic network nodes 100(h,1,0 . . . 0) in the pod-type decoding tree are diagnostic network nodes 100(h,p,r-l) comprising the root-leaf decoding tree. As shown in FIGS. 6A through 6C, depending from each diagnostic network node 100(h,1,0 . . . 0) in the pod-type decoding tree is a one or more trees of diagnostic network nodes 100(h,p,r-l) in the root-leaf decoding tree. In the embodiment depicted in FIGS. 6A through 6C, each diagnostic network node
100(h,p,r-l) can accommodate a fan-out of two, and so if the pod-type decoding tree includes one diagnostic network node 100(h,1,0 . . . 0), the diagnostic network 16 at that level may include up to two root-leaf decoding trees, which may connect to diverse types of other components in the system 10. Each root-leaf decoding tree includes a root diagnostic network node 100(h,p,r . . . 0) connected to the pod-type decoding tree, and extends to a plurality of leaf diagnostic network nodes
100(h,p,r-l) connected to a particular type of pods in the system 10.
The portions of system 10 comprising "pods" may depend upon the physical embodiment of the particular system. As depicted on FIGS. 6A through 6C, the data router nodes 22(i,j,k) may comprise one type of pod, the control network nodes 51(i,j,l) may comprise a second type of pod, and the leaves 21 may comprise a third type of pod. As shown in FIG. 6A, level "M," which corresponds to the root level of the control network 14 and data router 15, includes two root-leaf decoding trees. One root-leaf decoding tree comprises the diagnostic network nodes identified by reference numerals 100(M,1,1 . . . 0) through 100(M,1,r-l), which is connected to the pods of the data router nodes in the root data router node group 20(M,0). The other root-leaf decoding tree comprises the diagnostic network node identified by reference numeral 100(M,2,1 . . . 0), which is connected to the pod comprising the root control network node group 50(M,0).
Similarly, level "M-1," which corresponds to one level below the root level of the control network 14 and data router 15, also includes two root-leaf decoding trees. One root-leaf decoding tree comprises the diagnostic network nodes identified by reference numerals 100(M-1,1,1 . . . 0) through 100(M,1,r-l), which is connected to the pods of the data router nodes in the data router node groups 20(M-1,j), one level below the root level. The other root-leaf decoding tree comprises the diagnostic network nodes identified by reference numerals 100(M,2,10 . . . 0), 100(M,2,1 1 . . . 0), and 100(M,2,1 2 . . . 0) which are connected to the pods comprising the root control network node group 50(M,0). The other levels of the diagnostic network 16, down to level "1," which corresponds to the lowest levels in the control network 14 and data router 15, are similar, including two root-leaf decoding trees, one connected to pods comprising the data router node groups 20(i,j) and the other connected to pods comprising the control network node groups 50(i,j).
As indicated above, the diagnostic network 16 also includes a level "0" connected to leaves 21 in the system 10. That level includes only one root-leaf decoding tree, comprising the diagnostic network nodes 100(0,1,1 . . . 0) through
100(0,1,r-l), all of which are connected to leaves 21.
A "pod" may comprise an individual data router node 22(i,j,k), control network node 50(i,j,l) or leaf 21, or groups thereof. In one particular embodiment, a "pod" is a "field-replaceable unit," such as an entire circuit board, which is replaceable by field-service or maintenance personnel. In that embodiment, the diagnostic network 16 can diagnose and locate failures in such field-replaceable units.
It will be appreciated that, if a pod-type decoding tree at any particular level includes multiple diagnostic network nodes 100(h,p,0 . . . 0) organized in a tree structure, multiple the root-leaf decoding trees can be provided each depending from a node comprising a leaf of the pod-type decoding tree. Thus, for example, if a particular level in the diagnostic network 16 required three or four root-leaf decoding trees, each connected to pods of particular types, if the fan-out from each level to the next in the pod-type decoding tree is two, the pod-type decoding tree would include at least three diagnostic network nodes 100(h,p,r-l), including a root node and two leaf nodes connected thereto. In that case, each leaf node would be able to connect to two root-leaf decoding trees. It will be appreciated that, if the fan-outs in each of the trees is different from two, the number of levels and number of nodes in each level within each tree may also differ from that specifically described herein. In one particular embodiment, fan-outs in particular diagnostic network nodes 100(h,p,r-l) of both two and eight are used, at different levels in the respective trees comprising the diagnostic network 16.
The diagnostic network nodes 100(h,p,r-l) are generally similar, and will be described in detail in connection with FIGS. 13A through 13C. In brief, each diagnostic network node 100(h,p,r-l) includes an address control portion, generally identified by reference numeral 102, and a data control portion, generally identified by reference numeral 103. The address control portion of diagnostic network node 100(M,0,0 . . . 0) receives address control signals from the diagnostic processor over a bus 104(P). The node uses the address control signals to establish address state in an address state store 105.
The address state maintained by the diagnostic network node 100(M,0,0 . . . 0) enables it to transmit subsequently-received address control signals (a) to one child node, in this case node 100(M-1,0,0 . . . 0) over a bus 104(C.sub.1), (b) to the other child node, in this case node 100(M, 1,0 . . . 0) over a bus 104(C.sub.2, (c) to both child nodes over the same buses, or, alternatively, (d) to neither child node. The node's address control portion 102 includes flags 106(C.sub.1) and
106(C.sub.2) each associated with a corresponding bus 104(C.sub.1) and 104(C.sub.2). If the flag 106(C.sub.i) is set in response to the received address control signals, the node is enabled to thereafter transmit the address control signals to the respective child node over a bus 104(C.sub.i), and otherwise it is clear.
The diagnostic processor 101 controls the conditioning of each of the flags 106(C.sub.i) in the state store 105 of diagnostic network node 100(M,0,0 . . . 0) serially. After the address state has been established in the state store 105 of diagnostic network node 100(M,0,0 . . . 0), the node transmits the address control signals that it subsequently receives over bus 104(P) from the diagnostic processor 101 over the particular buses 104(C.sub.i) whose flags 106(C.sub.i) are set. If both flags 106(C.sub.i) are set, the diagnostic network node 100(M,0,0 . . . 0) transmits the address control signals over both buses 104(C.sub.i) in parallel. The address control signals thereafter enable either or both of those nodes to condition the flags 106(C.sub.i) in their respective address state stores 105, enabling them to thereafter transmit the address control signals received thereby to either or both of the diagnostic network nodes 100(h,p,r-l) connected thereto. This process continues until flags 106(C.sub.i) are set in selected ones of the leaf diagnostic network nodes 100(h,p,r-l) in the root-leaf decoding tree. This process may be repeated any number of times to condition flags 106(C.sub.i) in any combination of the leaf diagnostic network nodes 100(h,p,r-l).
The sequence of flags 106(C.sub.i) that are set in the various diagnostic network nodes 100(h,p,r-l), from the root diagnostic network node 100(1,0,0 . . . 0) in the height decoding tree to the leaf diagnostic network nodes 100(h,p,r-l) in the root-leaf decoding trees, essentially form paths from the diagnostic processor 101 to selected pods. The paths may be subsequently used to carry diagnostic test data in parallel from the diagnostic processor to the selected pods, and to return test results.
After it has conditioned flags 106(C.sub.i) in the various diagnostic network nodes 100(h,p,r-l), the diagnostic processor 101 may also retrieve the state from each of the diagnostic network nodes 100(h,p,r-l). After each flag 106(C.sub.i) is conditioned, the diagnostic network node 100(h,p,r-l) may transmit a signal representing its state its state over its bus 104 (P), which is coupled up the tree to the diagnostic processor 101. If multiple flags are conditioned in diverse nodes in parallel, the diagnostic processor 101 transmits an expected address data signal, which enable the nodes intermediate the originating nodes and the diagnostic processor to combine the signals representing the state of the respective flags in response to a control signal from the diagnostic processor 101.
Thus, if the flags 106(C.sub.i) whose conditions are being retrieved are to be set, resulting in asserted state signals, the diagnostic processor 101 may enable the intermediate nodes to logically AND the flag state signals received from their child nodes. In that case, if an intermediate node receives a negated state signal, indicating that the flag 106(C.sub.i) whose condition is received is, erroneously, not set, the node will provide a negated state signal, which will be propagated up the tree to the diagnostic processor 101. On the other hand, if the flags whose conditions are being retrieved are to be cleared, resulting in negated state signals, the diagnostic processor 101 may enable the intermediate nodes to logically OR the flag state signals received from their child nodes. In that case, if an intermediate node receives an asserted state signal, indicating that the flag 106(C.sub.i) whose condition is received is, erroneously, not clear, the node will provide an asserted state signal, which will be propagated up the tree to the diagnostic processor 101.
After the diagnostic processor 101 has established the address states in the respective diagnostic network nodes 100(h,p,r-l) to selected pods, it may transmit a test data out signal and an expected test data control signal, which are received by the root diagnostic network node 100(M,0,0 . . . 0), over a bus 110(P). The root diagnostic network node 100(M,0,0 . . . 0) transmits the received signals over respective buses 110(C.sub.1) and 110(C.sub.2), as determined by the states of the respective flags 106(C.sub.i), and the other diagnostic network nodes do the same. Thus, the diagnostic network nodes 100(h,p,r-l) couple the test data out signal and expected test data control signal down the respective trees along paths defined by the set flags 106(C.sub.i). At some point, at least some of the leaf diagnostic network nodes 100(h,p,r-l) will couple test data signals to the selected pods, and obtain test data out signals representing diagnostic test results.
The diagnostic network nodes 100(h,p,r-l) will pass the test data out signals up the paths defined by the set flags 106(C.sub.i), each node combining the test data out signals received from its children in response to the expected test data control signal in a manner similar to that described above in connection with retrieval of the states of the respective flags. That is, if the test data out signal is expected to be asserted, the diagnostic processor 101 may enable the nodes to logically AND the test data signals received from the pods or child nodes connected thereto. In that case, if an intermediate node receives an erroneous negated test data out signal, the node will provide a negated test data out signal to its parent, which will be propagated up the tree defining the diagnostic network 16 to the diagnostic processor 101. On the other hand, if the test data out signal is expected to be negated, the diagnostic processor 101 may enable the intermediate nodes to logically OR the test data out signals received from the pods or the child nodes connected thereto. In that case, if an intermediate node receives an erroneous asserted test data out signal, the node will provide an asserted test data out signal to its parent, which will be propagated up the tree to the diagnostic processor 101.
If the diagnostic processor 101 receives an erroneous test data out signal, it can thereafter repeat the operations in connection with subsets of the previously-selected pods to identify the one which provided the erroneous signal. In that operation, the diagnostic processor 101 establishes states of the address flags 106(C.sub.i) in the diagnostic network nodes 100(h,p,r-l) to establish paths therethrough to a selected subset and repeats the test operation in connection with that subset. If the test data out signal indicates an erroneous result, the diagnostic processor 101 can reduce the size of the subset and repeat the operation. If the test data out signal indicates a correct result, on the other hand, the diagnostic processor 101
can repeat the operation in connection with a different subset. In one embodiment, the diagnostic processor 101 performs a binary search operation, iteratively repeating the operation in connection with half of the pods selected during the previous iteration to locate the pod providing the erroneous test data out signal.
Although not shown in FIGS. 6A through 6C, the diagnostic network 16 may include multiple diagnostic processors connected to various ones of the diagnostic network nodes 100(h,p,r-l). Each diagnostic processor may selectively control the portions of the tree defining the diagnostic network 16 below the diagnostic network node 100(h,p,r-l) connected thereto. Alternatively, the diagnostic processors may selectively condition the diagnostic network nodes 100(h,p,r-l) connected thereto to receive signals from, and transmit signals to, their respective parent diagnostic network nodes 100(h,p,r-l). The additional diagnostic processors may facilitate diverse diagnostic operations in various parts of the system 10 in concurrently.
In one specific embodiment, the interface between the leaf diagnostic network nodes 100(h,p,r-l) and the pods comprises the interface defined by the Joint Test Action Group ("JTAG"), as described in IEEE Std. 1149.1 (hereinafter "JTAG specification"). In any event, the interface provides a serial scan chain circuit in each pod. The serial scan chain circuit in each pod may extend through a number of registers and other storage elements in the respective pods, and may be used to establish the states thereof to thereby establish selected operating conditions in the respective pods. For example, each data router nodes 22(i,j,k) and control network nodes 51(i,j,l) uses height signals id