Home
Patent Search
IMT Blog
REGISTER
|
SIGN IN
United States Patent
4654857
Samson , ; et al.
March 31, 1987
Title
Digital data processor with high reliability
Abstract
A fualt-tolerant computer system provides information transfers between the units of a computing module, including a processor unit and a memory unit and one or more peripheral control units, on a bus structure common to all the units. Information-handling parts of the system, both in the bus structure and in each unit, can have a duplicate partner. Error detectors check the operation of the bus structure and of each system unit to provide information transfers only on fault-free bus conductors and between fault-free units. The computer system can operate in this manner essentially without interruption in the event of faults by using only fault-free conductors and functional units. Arbitration circuits of unusual speed and simplicity provide units of the computing module with access to the common bus structure according to the priority of each unit. The units of a module check incoming and outgoing signals for errors, signal other module units of a detected error, and disable the unit from sending potentially erroneous information onto the bus structure.
Inventors:
Samson; Joseph E.
(Dover,
MA
)
, Wolff; Kenneth T.
(Medway,
MA
)
, Reid; Robert
(Dunstable,
MA
)
, Hendrie; Gardner C.
(Marlboro,
MA
)
, Falkoff; Daniel M.
(Natick,
MA
)
, Dynneson; Ronald E.
(Brighton,
MA
)
, Clemson; Daniel M.
(Weston,
MA
)
, Baty; Kurt F.
(Medway,
MA
)
Assignee:
Stratus Computer, Inc.
(Marlboro,
MA
)
Appl. No.:
762039
Filed:
August 2, 1985
Current U.S. Class:
714/5
902/38
Current International Class:
G06F 11/16 (20060101)
Field of Search:
371/7-11,68 364/200
U.S. Patent Documents
3469239
September 1969
Richmond et al.
3544973
December 1970
Borck et al.
3548382
December 1970
Lichty et al.
3560935
February 1971
Beers
3609704
September 1971
Schurter
3641505
February 1972
Artz et al.
3665173
May 1972
Bouricus et al.
3768074
October 1973
Sharp et al.
3795901
March 1974
Boehm et al.
3805039
April 1974
Stifflev
3820079
June 1974
Bergh et al.
3840861
October 1974
Amdahl et al.
3879712
April 1975
Edge et al.
3893084
July 1975
Kotok et al.
3895353
July 1975
Dalton
3991407
November 1976
Jordan, Jr. et al.
3997896
December 1976
Cassarino, Jr. et al.
4015243
March 1977
Kurpanek et al.
4015246
March 1977
Hopkins, Jr. et al.
4032893
June 1977
Moran
4096571
June 1978
Vander Mey
4096572
June 1978
Namimoto
4150428
April 1979
Inrig et al.
4159470
June 1979
Strojny et al.
4177510
December 1979
Appell et al.
4190821
February 1980
Woodward
4228496
October 1980
Katzman
4233682
November 1980
Liebergot et al.
4245344
January 1981
Richter
4253147
February 1981
McDougall et al.
4263649
April 1981
Lapp, Jr.
4279034
July 1981
Baxter
4304001
December 1981
Cope
4310879
January 1982
Pandeya
4323966
April 1982
Whiteside et al.
4347563
August 1982
Paredes et al.
4354267
October 1982
Mori et al.
4356546
October 1982
Whiteside et al.
4410983
October 1983
Cope
4428044
January 1984
Liron
4438494
March 1984
Budde et al.
4453215
June 1984
Reid
4484273
November 1984
Stiffler et al.
Foreign Patent Documents
1200155
Apr., 1970
GB
2060229
Apr., 1981
GB
WO81/00925
Apr., 1981
WO
Other References
Su, "A Hardware Redundancy Reconfiguration Scheme for Tolerating Multiple Module Failures" IEEE Tran on Computers vol. C-29, No. 3, Mar.'80, pp. 254-258. .
Katsuki, "Pluribus-An Operational Fault-Tolerant Multiprocessor" Proceedings of the IEEE vol. 66, No. 10, Oct. 1978, pp. 1146-1159. .
Takaoka "N-Fail Safe Logical Systems" IEEE Trans. on Computers vol. C-20, No. 5, May 1971, pp. 536-542. .
"Standard Specification for S-100 Bus Interface Devices" Computer vol. 12, No. 7, Jul. 1979, pp. 28-52. .
Kogge, The Architecture of Pipelined Computers Hemisphere Publishing Corp. 1981, (Table of Contents). .
Hamming, "Error Detecting & Error Correcting Codes" The Bell System Tech. J. vol. XXVI, Apr. 1950, No. 2, pp. 147-160. .
Losq, "A Highly Efficient Redundancy Scheme: Self-Purging Redundancy" IEEE Trans. on Computers vol. C-25, No. 6, Jun. 1976, pp. 564-578. .
Depledge, "Fault Tolerant Microcomputer Systems for Aircraft" Conference on Computer Systems & Technology Brighton, Sussex, England, Mar. 29-31, 1977. .
Rennels, "Architecture for Fault-Tolerant Spacecraft Computers," Proceedings IEEE, vol. 66, No. 10, pp. 1255-1268, (1978). .
Anderson & Lee, Fault Tolerance, Principles and Practice Prentice-Hall International, New Jersey, 1981, (Table of Contents). .
The Bell System Technical Journal, Sep. 1964, pp. 1845-1847, 1872-1877, 1966-1980, 2021-2022. .
Electronics, "Computers People Can Count On", Jan. 27, 1983, pp. 93-105. .
Gorsline, G. W., Computer Organization Hardware/Software, Prentice-Hall, Inc. 1980, pp. 221-227. .
AFIPS Conference Proceedings, vol. 41, Part II, 1972, "C.mmp-A Multi-Mini-Processor", W. A. Wulf & C. G. Bell, pp. 765-777. .
Computer Design, "Design Motivations for Multiple Processor Microcomputer Systems", vol. 17, Mar. 1978, pp. 81-89. .
Computing Surveys, "Multiprocessor Organization--A Survey", vol. 9, No. 1, Mar. 1977, pp. 103-129. .
Mano M., Computer System Architecture, Prentice-Hall, Inc. 1982, pp. 454-473..~
Primary Examiner:
Malzahn; David H.
Attorney, Agent or Firm:
Lahive & Cockfield
Parent Case Text
This application is a continuation of application Ser. No. 307,632, filed 10/1/81, now abandoned.
Claims
Having described the invention, what is claimed as new and secured by Letters Patent is set forth in the appended claims:
1. Digital data processor apparatus for continuous operation in the event of at least certain faults, said apparatus having plural functional units including at least a first central processing unit, a first memory unit, and a first control unit for a peripheral device, each said functional unit including a first signal processing section arranged for receiving signals transferred from other said functional units and for processing said received signals for producing output signals for transfer to other said functional units, said apparatus having the improvement comprising
A. bus means arranged for transferring signals at least between said processing unit and said memory unit and between said processing unit and said control unit,
B. system clock connected with said bus means and with said functional units for providing signals for synchronizing signal transfers between said first, second, and third functional units and for providing operational timing signals for said first, second, and third functional units;
C. a further, fourth functional unit arranged with said bus means for synchronously transferring information with other of said functional units, said fourth unit duplicating a selected one of said processor, memory, and peripheral control units and including a first signal processing section arranged for receiving signals transferred on said bus means and for processing said received signals for producing output signals identically to the response of said one selected unit to such received signals and for applying said output signals to said bus means for transfer to other of said units, said fourth functional unit being connected with said bus means for receiving operational timing signals from said system clock,
D. fault detection means for checking the operation of each of said one selected and fourth functional units in response to signals applied to each such unit identically from said bus means and for determining a fault condition in any such unit, said fault detection means comprising second signal processing means arranged for at least receiving signals transferred on said bus means and for processing said received signals synchronously and substantially identically with the first signal processing sections of each of said one selected and fourth units, and
E. logic means connected with said fault detection means and being responsive to the detection by said fault detection means of a fault condition in any one of said one selected and fourth functional units for inhibiting the unit detected as being faulty from applying potentially-faulty signals to said bus means.
2. Apparatus according to claim 1 in which
A. said fault detection means checks information which is ready in each of said one selected and fourth functional units for transfer to other units, and
B. said logic means responds to the detection of a fault in said information which is ready for transfer by inhibiting the transfer thereof by the unit detected as being faulty.
3. Apparatus according to claim 1 in which
A. said fault detection means checks information in each of said one selected and fourth functional units substantially concurrently with the transfer thereof to other units, and
B. said logic means responds to the detection of a fault in said information by signaling other of said units of said fault detection and by repeating the transfer of that information by the non-faulty duplicate unit.
4. Apparatus according to claim 1 further comprising
A. supply means for providing electrical operating power for said functional units, and
B. power logic means responsive independently to the level of said operating power at each of at least said one selected and fourth units for conditioning any of said one selected unit and said fourth unit from applying information transfer signals to said bus in the event said operating power at that unit is belpw a selected supply condition.
5. Apparatus according to claim 1 in which said bus means includes a third bus connected for applying said timing signals to said functional units for synchronizing the production of output signals by at least said one selected and fourth units.
6. Apparatus according to claim 1 further characterized in that said bus means comprises bus conductors common to all said functional units and applying to all said units any information transferred on said bus means by any one of said units.
7. Digital data processor apparatus characterized at least in part by continued operation in the event of an error-producing fault, said apparatus having at least first, second, and third functional units, one of which is a first central processing unit, another of which is a first memory unit, and another of which is a first control unit for a peripheral device, each said functional unit being responsive to input signals for producing output signals, said apparatus having the improvement comprising
A. bus means having at least first and second redundant buses, each of which is arranged for transferring signals between all said functional units and furhter arranged for applying to all said units information which is transferred to said bus means by any said unit,
B. fault detection means connected with said units and said buses for checking information transferred between said units and said buses for fault conditions on said buses, and
C. logic means connected with said fault detection means and with said units and being responsive to said fault detection means, said logic means responding to the absence of any detected fault condition on said first and second buses for providing at least selected information transfers identically and simultaneously on both said buses, and responding to the detection of a fault on a single one of said buses for conditioning said units to perform information transfers exclusively on the other of said buses.
8. Apparatus according to claim 7 in which
A. said fault detection means checks selected information substantially concurrently with the transfer thereof to said buses from said units, and
B. said logic means responds to the detection of a fault in said selected information by signaling other of said units of said fault detection and by repeating the transfer of that information.
9. Processor apparatus according to claim 7 further comprising clock means for applying timing signals to said bus means for synchronizing the normal transfer of identical information signals by said functional units on each of said first and second buses.
10. Apparatus according to claim 7, having the improvement further comprising a third bus in said bus structure providing operating signals to all said units both in the absence and in the presence of a detected fault condition.
11. Apparatus according to claim 10 in which said fault detection means includes a separate fault detection means in each said unit for detecting faults in that unit, each said separate fault detection means responding to the detection of a fault condition in that unit to apply at least one fault-responsive control signal to said third bus for transfer to other of said units.
12. Apparatus according to claim 7 in which said logic means includes means for responding to the detection of a fault in information being transferred from any said unit to any of said first and second buses to condition that unit for applying no further signals to any of said first and second buses.
13. Digital data processor apparatus having plural functional units including at least a central processing unit, a random access memory unit, and a control unit for a peripheral device, each said functional unit being responsive to input signals for producing output signals, and apparatus being characterized by
A. at least first and second buses, each of which is connected for transferring signals between said functional units and is further arranged for applying to all said units information transferred to that bus by any said unit,
B. at least a fourth functional unit arranged with said first bus and with said second bus for transferring information with other of said functional units identically as a selected one of said processor, memory, and peripheral control units, said fourth unit duplicating said one selected unit and responding to input signals received from any of said first and second buses to produce output signals identically to the response of said one selected unit to such input signals,
C. fault detection means connected with said units for checking information transfers of said units for detecting fault conditions in any said unit and in any said bus, and
D. logic means connected with said fault detection means and with said units and being responsive to said fault detection means, said logic means
(i) responding to the absence of any detected fault condition for providing information transfers on both said buses identically and simultaneously in at least one direction with both said one selected and fourth units,
(ii) responding to the detection of a fault condition in one of said one selected and fourth units for disabling that unit from driving information-transferring signals onto either said bus, and
(iii) responding to the detection of a fault condition on one said bus for conditioning all said units to respond only to information-transferring signals on the other said bus.
14. Apparatus according to claim 13 further comprising
A. supply means for providing electrical operating power for at least said one elected and fourth units, and
B. power logic means responsive independently to the level of said operating power at each of said one elected and fourth units for disabling such unit from applying information transfer signals to said buses in the event said operating power at that unit is below a selected supply condition.
15. Apparatus according to claim 13 in which
A. said apparatus includes a third bus connected with all said units and providing information transfers between said units,
B. each of said one selected and fourth units includes first and second signal-processing sections, each of which is arranged for receiving signals from said third bus and from any of said first and second buses, and for processing said received signals for producing output signals for application to said bus structure, and
C. said second fault detection means includes comparator means connected with the first and second signal-processing sections of each of said one selected and fourth units for comparing corresponding output signals from said first and second sections of that unit.
16. Apparatus according to claim 13 in which
A. both of said one elected and fourth units are selected from the functional units of a central processing unit, a memory unit, and a control unit for a synchronous device and
B. said logic means comprises means for operating said one elected and fourth units, in the absence of a detected fault condition in either of them, in lock-step synchronism with one another.
17. Apparatus according to claim 13 in which
A. both of said one elected and fourth units are control units for asynchronous devices, and
B. said logic means include means for operating said one elected and fourth units, in the absence of a detected fault condition in either of them, to receive from said bus structure substantially identical information-transferring signals.
18. Apparatus according to claim 13 in which said logic means includes means for providing information transfers which occur on both said buses with lockstep synchronism between said buses.
19. Apparatus according to claim 13 in which said fault detection means includes a separate fault detection means in each said unit for detecting faults in that unit,
each said separate fault detection means responding to the detection of a fault condition in that unit to produce at least one error-reporting signal.
20. Apparatus according to claim 13 further comprising a further conductor bus connected to all said units for applying thereto signals different from those on said first and second buses.
21. Apparatus according to claim 20 further comprising
A. an electrical supply applying electrical operating power to conductors of said further bus, and
B. processor timing means for applying timing signals to conductors of said further bus.
22. Apparatus according to claim 20 further comprising means in said fault detection means for applying to conductors of said further bus a first bus-error signal for reporting the detection of a fault condition on said first bus and a second bus-error signal for reporting the detection of a fault condition on said second bus.
23. Apparatus according to claim 13 further comprising
A. supply means for providing electrical operating power for said functional units, and
B. power logic means responsive to the level of said operating power at each such unit for disabling each unit from applying information transfer signals to said buses in the event said operating power at that unit is below a selected supply condition.
24. Apparatus according to claim 23
A. in which said supply means includes a separate power supply stage associated with each said unit and providing operating power for that unit, and
B. in which said power logic means includes a separate power logic stage associated with each said unit and connected with the supply stage associated therewith.
25. Apparatus according to claim 13 further comprising
A. at least a fifth functional unit duplicating a second selected unit not duplicated by said fourth functional unit,
B. first power supply means connected to provide electrical operating power for said one selected unit and said second selected unit, and
C. second supply means connected to provide electrical operating power for said fourth unit and said fifth unit,
each of said first and second supply means being arranged for operation independent of each other.
26. Digital data processor apparatus for continuous operation in the event of at least certain faults, said apparatus having plural functional units including at least a central processing unit, a memory unit, and a control unit for a synchronous peripheral device, one of said units being termed a first functional unit, each said functional unit being responsive to input signals for producing output signals, said apparatus having the improvement comprising
A. bus means arranged for trasnferring signals at least between said processing unit and said memory unit and between said processing unit and said peripheral control unit,
B. system clock connected with said bus means and with said functional units for providing signals for synchronizing signal transfers between said first, second, and third functional units and for providing operational timing signals for said first, second, and third functional units,
C. a fourth functional unit arranged with said bus means for synchronously transferring information with selected other of said units identically as said first unit, said fourth unit duplicating said first unit and responding to input signals to produce output signals identically to the response of said first unit to such input signals, said fourth unit being arranged with said bus means for receiving operational timing signals from said system clock,
D. fault detection means connected with said first and fourth units for checking the operation of each of said first and fourth functional units and for determining a fault condition in any such unit, and
E. logic means connected with said fault detection means and with at least said first and fourth units and being responsive to the detection of a fault condition in any one of said first or fourth functional units for inhibiting the unit detected as being faulty from applying potentially-faulty signals to said bus means, said logic means comprising means for operating said first and fourth units, in the absence of a detected fault condition in either of them, in lock-step synchronism with one another.
27. Apparatus according to claim 26 in which
A. said fault detection means includes means for checking information which is ready in each of said first and fourth functional units for transfer to other units, and
B. said logic means responds to the detection of a fault in said information which is ready for transfer by inhibiting the transfer thereof by the unit detected as being faulty.
28. Apparatus according to claim 26 in which
A. said fault detection means includes means for checking information in each of said first and fourth functional units substantially concurrently with the transfer thereof to other units, and
B. said logic means responds to the detection of a fault in said information by signalling other of said units of said fault detection and by repeating the transfer of that information by the non-faulty duplicate unit.
29. An information processing method for digital data processor apparatus having functional units including at least a first central processing unit, a first memory unit, and a first control unit for a peripheral device, each said functional unit including a first signal processing section arranged for receiving signals transferred from other said functional units and for processing said received signals for producing output signals for transfer to other said functional units, and further having bus means for transferring signals between said functional units, said method being characterized by at least partial continuous operation in the event of at least certain error-producing faults, and by the steps of
A. duplicating the response of a selected one of said processor, memory, and peripheral control units with a fourth unit that responds to input signals to produce output signals identical to the response of said one selected unit to such input signals, said fourth unit including a first signal processing section arranged for receiving signals transferred from other of said units and for processing said received signals for producing output signals identically to the response to said one selected unit to such received signals and for transferring said output signals to other of said units,
B. transferring information signals between said processor, memory, and peripheral control units, and further transferring information signals between said fourth unit and other of said units identically as between said selected one unit and other of said units,
C. providing system timing signals for synchronizing signal transfers between said first, second, and third functional units and for providing said operational timing signals to said first, second, and third functional units,
D. checking the operation of each of said one selected unit and said fourth unit in response to input signals each such unit receives identically with the other such unit, to detect a fault in the operation of either of said units, said checking step comprising duplicating the processing of signals received by each of said one selected and fourth units, said duplicative processing being performed by second signal processing means arranged for at least receiving signals transferred from other of said units and for processing said signals synchronously and substantially identically with the first signal processing sections of each of said one selected and fourth units, and
E. responding to a fault detection in either of said one selected and fourth units for inhibiting the unit detected as being faulty from applying information signals to other of said units.
30. An information processing method according to claim 29, wherein
said duplicating step includes operating each of said one selected and fourth units for producing output signals in response to signals received identically from said bus means in synchronism with one another.
31. An information processing method according to claim 29 further characterized by the steps of
A. performing said fault-checking operation for each of said one selected and fourth units concurrently with the transfer of signals produced by that unit to said bus means, and
B. responding to a detected fault condition by signalling other of said units of said fault detection and by repeating the transfer of signals for the timing cycle in which the detected fault occurred, said repeating of signals being performed with the non-faulty duplicate unit.
32. An information processing method according to claim 29 characterized by the further steps of
A. timing said fault-checking operation in each of said one elected and fourth units prior to the transfer of produced signals from that unit, and
B. responding to the detection of a fault to inhibit the transfer.
33. An information processing method according to claim 29 characterized by the further step of providing said timing signals to all said units by way of bus means common to all said units for operating at least said one selected and fourth units in lock-step synchronism with one another.
34. An information processing method according to claim 29 further characterized by the step of applying at least certain information signals being transferred between said units to any of first and second duplicative buses, each of which is arranged to apply to all said units all of said certain information signals.
35. An information processing method according to claim 34 further characterized by the steps of
A. receiving, at each said processing unit and at each said memory unit, signals from only a selected one of said first and second buses, and
B. transmitting from each processing unit and from each memory unit signals to both said first and second buses synchronously.
36. An information processing method according to claim 34 further characterized by the step of conditioning at least selected ones of said functional units to respond only to information transferring signals on one of said first and second buses which is free of a fault condition.
Description
TABLE OF CONTENTS
Reference to Related Applications
Background of the Invention
Summary of the Invention
Brief Description of Drawings
Description of Illustrated Embodiments
The Processor Module
Module Operation
Module Organization
Bus Structure Organization
Cycle Phases
Pipeline Phases
Arbitration Network
Central Processing Unit
CPU Fault Detection
CPU Operating Sequence
Memory Unit
Peripheral Control Units
Bus Interface Section
Communication Control Unit
Tape Control Unit
Central Power Supply
Clamp Circuit
Claims
REFERENCE TO RELATED APPLICATIONS
This application is related to the following commonly-assigned applications filed concurrently herewith:
______________________________________ Title Ser. No. ______________________________________ "Digital Data Processor 307,436 With Fault Tolerant Bus Protocol" "Digital Logic For 307,440 (abandoned) Priority Determination" "Central Processing 307,525 Apparatus" (U.S. Pat. No. 4,453,215) "Computer Memory 307,502 Apparatus" (Abandoned in favor of continuation application Ser. No. 698,257) "Computer Peripheral 307,524 Control Apparatus" (U.S. Pat. No. 4,486,826) ______________________________________
BACKGROUND OF THE INVENTION
This invention relates to digital computing apparatus and methods that provide essentially continuous operation in the event of numerous fault conditions. The invention thus provides a computer system that is unusually reliable. The computer system also is highly flexible in terms of system configuration and is easy to use in terms of sparing the user from concern in the event of numerous fault conditions. The system further provides ease of use in terms of programming simplifications and in the provision of relatively low-cost hardware to handle numerous operations.
Faults are inevitable in digital computer systems due, at least in part, to the complexity of the circuits and of the associated electromechanical devices, and to programming complexity. There accordingly has long been a need to maintain the integrity of the data being processed in a computer in the event of a fault, while maintaining essentially continuous operation, at least from the standpoint of the user. To meet this need, the art has developed a variety of error-correcting codes and apparatus for operation with such codes. The art has also developed various configurations of equipment redundancies. One example of this art is set forth in U.S. Pat. No. 4,228,496 for "multiprocessor system". That patent provides pairs of redundant processing modules, each of which has at least a processing unit and a memory unit, and which operates with peripheral control units. A fault anywhere in one processing module can disable the entire module and require the module paired with it to continue operation alone. A fault anywhere in the latter module can disable it also, so that two faults can disable the entire module pair.
This and other prior practices have met with limited success. Efforts to simplify computer hardware have often led to unduly complex software, i.e. machine programming. Efforts to simplify software, on the other hand, have led to excessive equipment redundancy, with attendant high cost and complexity.
It is accordingly a general object of this invention to provide a digital computer system which operates with improved tolerance to faults and hence with improved reliability.
Another object of the invention is to provide digital computer apparatus and methods for detecting faults and for effecting remedial action, and for continuing operation, with assured data integrity and essentially without disturbance to the user.
It is also an object of the invention to provide fault-tolerant digital computer apparatus and methods having both relatively uncomplicated software and a relatively efficient level of hardware duplication.
A further object of the invention is to provide fault-tolerant digital computer apparatus and methods which have a relatively high degree of decentralization of error detection and which operate with relatively simple corrective action in the event of an error-producing fault.
A further object of the invention is to provide fault-tolerant digital computer apparatus and methods of the above character which employ different error detection methods and structures for different system components for obtaining cost economies and hardware simplifications.
A more specific object of the invention is to provide a fault-tolerant computer system having a processor module with redundant elements in the bus structure and in the processing, the memory and the peripheral control units so arranged that the module can continue valid operation essentially uninterrupted even in case of faults in multiple elements of the module.
Other general and specific objects of the invention will in part be obvious and will in part appear hereinafter.
SUMMARY OF THE INVENTION
A computer system according to the invention has a processor module with a processing unit, a random access memory unit, and peripheral control units, and has a single bus structure which provides all information transfers between the several units of the module. The computer system can employ only a single such processor module or can be a multiprocessor system with multiple modules linked together. The bus structure within each processor module includes duplicate partner buses, and each functional unit can have a duplicate partner unit. Each unit, other than control units which operate with asynchronous peripheral devices, normally operates in lock-step synchronism with its partner unit. For example, the two partner memory units of a processor module normally both drive the two partner buses, and are both driven by the bus structure, in full synchronism.
Further in accord with the invention, the computer system provides fault detection at the level of each functional unit within a processor module. To attain this feature, error detectors monitor hardware operations within each unit and check information transfers between the units. The detection of an error causes the processor module to isolate the bus or unit which caused the error from transferring information to other units, and the module continues operation. The continued operation employs the partner of the faulty bus or unit. Where the error detection preceeds an information transfer, the continued operation can execute the transfer at the same time it would have occurred in the absence of the fault. Where the error detection coincides with an information transfer, the continued operation can repeat the transfer.
The computer system can effect the foregoing fault detection and remedial action extremely rapidly, i.e. within a fraction of an operating cycle. A preferred embodiment, for example, corrects a questionable information transfer within two clock intervals after detecting a fault-manifesting error. The computer system of this embodiment hence has at most only a single information transfer that is of questionable validity and which requires repeating to ensure total data validity.
Although a processor module according to the invention can have significant hardware redundancy to provide fault-tolerant operation, a module that has no duplicate units is nevertheless fully operational. This feature enables a user to acquire a computer system according to the invention at the low initial cost for a non-redundant configuration and yet attain the full computing capacity. The user can add duplicate units to the system, to increase the fault-tolerant reliability, as best suited for that user and as economies allow. This is in contrast to many prior computers, which are not expandable in this manner. A computer system according to the invention and having no duplicate units nevertheless provides significant error detection and identification, which can save the user from the results of numerous faults. The attainment of this feature also enables a computer system which has duplicate units to remain operational during removal, repair and replacement of various units.
In general, a processor module according to the invention can include a back-up partner for each unit of the module. Hence, a module can have two central processing units, two main (random access) memory units, two disc control units, two communication control units, and two link control units for linking the processor module to another module to form a multiprocessor system. The module further can have a tape control unit, for operation with a magnetic tape memory, but which generally is not duplicated.
This redundancy enables the module to continue operating in the event of a fault in any unit. In general, all units of a processor module operate continuously, and with selected synchronism, in the absence of any detected fault. Upon detection of an error-manifesting fault in any unit, that unit is isolated and placed off-line so that it cannot transfer information to other units of the module. The partner of the off-line unit continues operating and thereby enables the entire module to continue operating, normally with essentially no interruption. A user is seldom aware of such a fault detection and transition to off-line status, except for the display or other presentation of a maintenance request to service the off-line unit.
In addition to the foregoing partnered duplication of functional units within a processor module to provide fault-tolerant operation, each unit within a processor module generally has a duplicate of hardware which is involved in a data transfer. The purpose of this duplication, within a functional unit, is to test, independently of the other units, for faults within each unit. Other structure within each unit of a module, including the error detection structure, is in general not duplicated.
The common bus structure which serves all units of a processor module preferably employs a combination of the foregoing two levels of duplication and has three sets of conductors that form an A bus, a B bus that duplicates the A bus, and an X bus. The A and B buses each carry an identical set of cycle-definition, address, data, parity and other signals that can be compared to warn of erroneous information transfer between units. The conductors of the X bus, which are not duplicated, in general carry module-wide and other operating signals such as timing, error conditions, and electrical power.
A processor module according to the invention detects and locates a fault by a combination of techniques within each functional unit including comparing the operation of duplicated sections of the unit, the use of parity and further error checking and correcting codes, and by monitoring operating parameters such as supply voltages. Each central processing unit in the illustrated computer system, as one specific example, has two redundant processing sections which operate in lock-step synchronism. An error detector compares the operations of the redundant sections and, if the comparison is invalid, isolates the processing unit from transferring information to the bus structure. This isolates other functional units of the processor module from any faulty information which may stem from the processing unit in question. Each processing unit also has a stage for providing virtual memory operation and which is not duplicated. Rather, the processing unit employs parity techniques to detect a fault in this stage.
The random access memory unit of the illustrated computer system is arranged with two non-redundant memory sections, each of which is arranged for the storage of different bytes of a memory word. The unit detects a fault both in each memory section and in the composite of the two sections, with an error-correcting code. Again, the error detector disables the memory unit from transferring potentially erroneous information onto the bus structure and hence to other units.
The memory unit is also assigned the task in the illustrated processor module of checking the duplicated bus conductors, i.e. the A bus and the B bus. For this purpose, the unit has parity checkers that test the address signals and that test the data signals on the bus structure. In addition, a comparator compares all signals on the A bus with all signals on the B bus. Upon determining in this manner that either bus is faulty, the memory unit signals other units of the module, by way of the X bus, to obey only the non-faulty bus.
Peripheral control units for a processor module according to the invention employ a bus interface section for connection with the common bus structure, duplicate control sections termed "drive" and "check", and a peripheral interface section that communicates between the control sections and the peripheral input/output devices which the unit serves. There typically are a disc control unit for operation with disc memories, a tape control unit for operation with tape transports, a communication control unit for operation, through communication panels, with communication devices including terminals, printers and modems, and a link control unit for interconnecting one processor module with another in a multiprocessor system. In each instance the bus interface section feeds input signals to the drive and check control sections from the A bus and/or the B bus, applies output signals from the drive channel to both the A bus and the B bus, tests for logical errors in certain input signals from the bus structure, and tests the identity of signals output from the drive and check channels. The drive control section in each peripheral control unit provides control, address, status, and data manipulating functions appropriate for the I/O device which the unit serves. The check control section of the unit is essentially identical for the purpose of checking the drive control section. The peripheral interface section of each control unit includes a combination of parity and comparator devices for testing signals which pass between the control unit and the peripheral devices for errors.
A peripheral control unit which operates with a synchronous I/O device, such as a communication control unit, operates in lock-step synchronism with its partner unit. However, the partnered disc control units, for example, operate with different non-synchronized disc memories and accordingly operate with limited synchronism. For example, the partner disc control units perform write operations concurrently but not in precise synchronism inasmuch as the disc memories operate asynchronously of one another. A link control unit and its partner also typically operate with this limited degree of synchronism.
The power supply unit for the foregoing illustrated processor module employs two bulk power supplies, each of which provides operating power to only one unit in each pair of partner units. Thus, one bulk supply feeds one duplicated portion of the bus structure, one of two partner central processing units, one of two partner memory units, and one unit in each pair of peripheral control units. The bulk supplies also provide electrical power for non-duplicated units of the processor module. Each unit of the module has a power supply stage which receives operating power from one bulk supply and in turn develops the operating voltages which that unit requires. This power stage in addition monitors the supply voltages. Upon detecting a failing supply voltage, the power stage produces a signal that clamps to ground potential all output lines from that unit to the bus structure. This action precludes a power failure at any unit from causing the transmission of faulty information to the bus structure.
A further feature of the invention is that some units of the processor module execute each information transfer with an operating cycle that includes an error-detecting timing phase prior to the actual information transfer. A unit which provides this operation, an example of which is a control unit for a peripheral device, thus tests for a fault condition prior to effecting an information transfer. The unit inhibits the information transfer in the event a fault is detected. The module, however, can continue operation--without interruption or delay--and effect the information transfer from the non-inhibited partner unit.
Other units of the processor module, generally including at least the central processing unit and the memory unit, for which operating time is of more importance, execute each information transfer concurrently with the error detection pertinent to that transfer. In the event a fault is detected, the unit immediately produces a signal which alerts other processing units to disregard the immediately preceding information transfer. The processor module can repeat the information transfer from the partner of the unit which reported a fault condition. This manner of operation produces optimum operating speed in that each information transfer is executed without delay for the purpose of error detection. A delay only arises in the relatively few instances where a fault is detected.
The invention in one embodiment embraces digital data processor apparatus having at least a central processing unit, a random-access memory unit, a control unit for a mass storage device, and a control unit for a communication device, and further featuring a bus structure having redundant first and second buses and a third bus. The buses are connected with all the units for operating the units and for providing information transfers between them. Fault detection means check each information transfer between any unit and any one or more of the first bus and the second bus. The fault detection means detect fault conditions in a unit and in each of the first and second buses. The embodiment further features logic means responsive to the fault detection means and responding to the absence of any detected fault condition for providing information transfers on both the first bus and the second bus and responding to the detection of a fault in one of the first and second buses to condition all the units to respond only to information-transferring signals on the other of the first and second buses.
A further feature for practice with such an embodiment has a separate fault detection means in each unit for detecting faults in that unit, each separate fault detection means responding to the detection of a fault condition in that unit to apply at least one fault-reporting signal to the third bus for transfer to other units.
The practice of the invention can also provide a priority-determining feature which is characterized in that each of not more than 2.sup.(n) units connected to the bus structure, where (n) is an integer greater than one, can initiate an information transfer by way of the bus structure and each such unit selectively has a transfer-request signal. At least the third bus or each of the first and second buses has at least (n) conductors for providing priority selection among those units. The apparatus in this instance has plural arbitration circuit means each of which is associated with a different one of the transfer-initiating units. Each arbitration circuit means is connected with the (n) selection conductors, and responds to a transfer-request signal in the associated unit to apply to the selection conductors a parallel rank-responsive digital signal responsive to a unique priority-rank of that unit, and to produce a transfer-initiate output signal in the absence of a rank-responsive signal on the selection conductors from a higher-priority rank. This arbitration logic operates in a single timing interval and requires minimal bus conductors and logic circuitry. Further, it can determine priority for any of numerous operations, including bus requests, channel requests and priority interrupt requests.
A processor module of the foregoing character can also employ, pursuant to a feature of the invention, supply means for providing electrical operating power for the processor, memory and control units, and power logic means responsive to the level of operating power for preventing those units from applying information transfer signals to the buses in the event the operating power is below a selected supply condition.
The central processing unit and the fault detection means of a processor module can include, according to a feature of the invention, first and second processing sections, each of which is arranged for receiving signals from the third bus and from either of the first and second buses, for providing identical processing in response to the received signals, and for producing output signals for application to the bus structure. There also is provided comparator means for comparing corresponding output signals from the first and second processing sections. The comparator means detects fault conditions in the processing unit in response to that signal comparison. The comparator means can also compare corresponding signals which the first and second processing sections receive from the bus structure, and detect a fault condition in response to that comparison of received signals.
The memory unit and the fault detection means of a processor module can include, as a feature of the invention, first and second random access memory sections, each of which is arranged for storing portions of memory words and which together store complete memory words. Means are provided for writing into each memory section a memory word portion received from any of the first and second buses, and means are provided for reading a complete memory word from both memory sections and for applying the memory word selectively to the first and second buses. There is also provided means for checking memory-word parity and for detecting a fault condition in response to invalid memory-word parity.
At least one control unit and the fault detection means of a processor module according to the invention can employ, pursuant to yet another feature, first and second device controlling sections, each of which is arranged to receive signals from at least any of the first and second buses, and each of which is arranged for providing identical operations in response to the received signals and for producing output signals in response to those operations. At least the first such device is arranged to apply output signals to both the first bus and the second bus and to apply output signals to a device connected therewith. This embodiment further employs comparator means for comparing corresponding output signals from the first and second controlling sections. The comparator means detects fault conditions in the one control unit in response to such a signal comparison.
The invention in another embodiment embraces digital data processor apparatus having first and second redundant central processing units, first and second redundant random access memory units, at least a first control unit for a peripheral device, and at least first and second buses, each of which is connected for transferring information between the aforesaid units. Fault detection means are provided for checking each information transfer between units. The fault detection means detects fault conditions in any unit and in any bus. Logic means responsive to the fault detection means are also provided. The logic means respond to the absence of any detected fault condition for providing information transfers on both the buses and identically with both the central processing units and identically with both the memory units, and respond to the detection of a fault in one processing unit to inhibit that unit from driving information-transferring signals onto either bus. The logic means further respond to the detection of a fault in one memory unit to inhibit that unit from driving information-transferring signals onto either bus, and respond to the detection of a fault in one bus to condition all the units to respond only to information-transferring signals on the other bus.
It is also a feature that the logic means provide information transfers which occur on both the buses with lockstep synchronism between the buses.
The invention in a further embodiment embraces digital data processor apparatus having at least one central processing unit, at least one memory unit, at least two control units for peripheral processor devices, and a bus structure connected with each unit for transferring information between the units, and being characterized in that not more than 2.sup.(n) units which are connected to the bus structure, where (n) is an integer of two or greater, can initiate an information transfer by way of the bus structure, and in that each such unit selectively has a transfer request signal. There are at least (n) selection conductors connected with each transfer-initiating unit, and there are plural arbitration circuits, each of which is associated with a different one of the transfer-initiating units. Each arbitration circuit is connected with the selection conductors, and responds in a single timing interval to a transfer request signal in the associated unit to apply to the selection conductors a parallel rank-responsive digital signal responsive to a priority-rank of that unit, and to produce a transfer initiate signal in the absence of a rank-responsive signal on the selection conductors from a higher-priority rank. Further features are that each arbitration circuit produces the rank-responsive signal with not more than (n) digits, and that each selection conductor is assigned a digit position and is arranged with a number of electrically-isolated conductor segments according to the assigned digit position.
Central processing apparatus according to the invention provides programmable processing of digital information including the transfer of digital information with memory apparatus and with peripheral apparatus by way of any of first and second duplicative buses, and features first and second programmable digital data processing means that are at least substantially alike. Each processing means is arranged for receiving, and for producing, information-transferring signals, and for applying produced signals to at least one bus. Multiplex means connected with the processing means apply the information-transferring signals from either of the first and second buses to both processing means. Further, means are provided for comparing produced signals from the first processing means with those from the second processing means and for producing a fault-reporting signal in response thereto.
The central processing apparatus also features timing control means for operating each processing means to process successive operations from different information-transferring sequences.
Random-access computer memory apparatus according to the invention reads and writes digital information transferred to and from other computer apparatus by way of a bus structure having at least first and second duplicative buses, and further features first and second random access memory means, each of which is arranged for storing portions of memory words and which together are arranged for storing complete memory words. Multiplexor means apply word portions received from any one of the first and second buses to both memory means. Output means apply each memory word portion read from the memory means to both the first and second buses, and code checking means are in circuit with the output means for responding to invalid read-word error checking code to produce a fault-reporting signal.
It is also a feature of the invention to provide in such memory apparatus first code-introducing means for providing a selected code in each word portion applied to each of the memory means, and second code-introducing means for providing a selected further code in each two-portion word applied to the two memory means. The second code-introducing means, in a preferred embodiment, includes means for providing the further code such that the code checking means can detect and correct any single bit error in a memory word.
These and other features of the invention enable a computer system to operate without transferring potentially faulty information from one functional unit to another, except in selected instances where the system attends to the transmission of potentially faulty information within a few clock phases at most of the fault and hence well within a single operating cycle.
The invention attains these and other features as set forth hereinafter with apparatus and methods that detect error-manifesting faults at the functional level of a central processing unit, a memory unit, or individual peripheral control units. As deemed preferable for reliability, the fault detection is implemented in each such unit at a point close to the connection of the unit to other units and/or devices. Further, the detection of error-manifesting faults can readily be distributed timewise so that every timing phase causes an error-checking operation.
The invention accordingly comprises the several steps and the relation of one or more of such steps with respect to each of the others, and the apparatus embodying features of construction, combinations of elements and arrangements of parts adapted to effect such steps, all as exemplified in the following detailed disclosure, and the scope of the invention is indicated in the claims.
BRIEF DESCRIPTION OF DRAWINGS
For a fuller understanding of the nature and objects of the invention, reference should be made to the following detailed description and the accompanying drawings, in which:
FIG. 1 is a block schematic representation of a computer system according to the invention;
FIG. 2 shows a set of timing diagrams illustrating operation of the bus structure of the computer system of FIG. 1;
FIG. 3 is a schematic representation of arbitration circuits for use in the system of FIG. 1;
FIG. 4 is a functional block representation of central processing units for the system of FIG. 1;
FIGS. 5A and 5B form a block schematic diagram of one central processing unit according to the invention;
FIG. 6 shows timing diagrams illustrating operation of the central processing unit of FIGS. 5A and 5B;
FIGS. 7 and 8 are diagrams illustrating operating sequences of the central processing unit of FIGS. 5A and 5B;
FIG. 9 is a block schematic diagram of a memory unit according to the invention;
FIG. 10 is a block schematic diagram of memory unit control logic according to the invention;
FIG. 11 is a functional block representation of a standard interface section of a control unit according to the invention;
FIGS. 12A and 12B form a block schematic diagram of an interface section according to FIG. 11;
FIG. 13 is a block diagram of control circuitry for the interface section of FIGS. 12A and 12B;
FIG. 14 is a block schematic diagram of control sections and a further interface section for a communication control unit according to the invention;
FIG. 15 is a block schematic diagram of a control circuit for a pair of communication control units according to the invention;
FIG. 16 shows section of a tape control unit according to the invention;
FIG. 17 is a block schematic diagram of a power supply arrangement according to the invention;
FIG. 18 is a block schematic diagram of a power supply stage according to the invention;
FIG. 19 shows timing diagrams illustrating the operation of the circuit of FIG. 18; and
FIG. 20 shows a clamp circuit for use in practicing the invention.
DESCRIPTION OF ILLUSTRATED EMBODIMENTS
The Processor Module
A processor module 10 according to the invention has, as FIG. 1 shows, a central processing unit (CPU) 12, a main memory unit 16, and control units for peripheral input/output devices and including a disc control unit 20, a communication control unit 24 and a tape control unit 28. A single common bus structure 30 interconnects the units to provide all information transfers and other signal communications between them. The bus structure 30 also provides operating power to the units of the module from a main supply 36 and provides system timing signals from a main clock 38.
A module 10 as shown can be connected with a disc memory 52, a communication panel 50 for hooking up communication devices, and with a tape transport 54 to form a complete, single-processor computer system. However, the illustrated module 10
further has a link control unit 32 for connection to other like processor modules by way of a linking bus structure 40. In this manner the module 10 forms part of a multiprocessor computer system.
The bus structure 30 includes two identical buses 42 and 44, termed an A bus and a B bus, and has an X bus 46. In general, the signals on the A bus and on the B bus execute information transfers between units of the module 10. Accordingly, these buses carry function, address, and data signals. The X bus in general carries signals that serve more than one other unit in the module and including main power, timing, status and fault-responsive signals.
With further reference to FIG. 1, each functional unit of the module 10 in accordance with the invention can have a back-up redundant partner unit. Accordingly, the illustrated module has a second central processing unit 14, a second memory unit
18, a second disc control unit 22, a second communication control unit 26, and a second link control unit 4. The system does not have a second tape control unit although such can be provided. It often is not cost effective in a computer system to provide full redundancy with a second tape control unit. Moreover, the absence from the FIG. 1 system of a second tape control unit illustrates that a computer system according to the invention can provide different degrees of tolerance to faults. Thus, not only can a second tape control unit be provided where a user's needs make this desirable, but conversely the system in FIG. 1 can be implemented with any one or more of the illustrated second units omitted.
Each unit 12 through 28, 32 and 34 is connected to all three buses of the bus structure 30. This enables each unit to transfer signals on either or both the A bus and the B bus, as well as on the X bus.
Module Operation
The basic operation of the system 10 is that, in the absence of a fault, the partner central processing units 12 and 14 operate in lock-step synchronism with one another. Thus, both units drive the A bus and the B bus identically, and both are driven identically by the two buses. The same is true for the partner memory units 16 and 18 and again for the partner communication control units 24 and 26. Further, both communication control units 24 and 26 jointly drive and are driven by a communication bus 48 that connects to one or more communication panels 50 which are connected to conventional communication devices such as keyboards, cathode ray tube terminals, printers and modems.
The disc control units 20 and 22, on the other hand, do not operate in full synchronism with one another because the disc memories 52, 52 with which they function operate asynchronously of one another. During fault-free operation, each disc control unit 20 and 22 writes data received from one bus 42, 44 in one memory 52 connected with it. Hence two disc memories, each connected to a different disc control unit, contain identical data. During a read operation, the system reads the stored data from one of these two memories 52 depending on which control unit 20, 22 is available and can effect the read operation in the least time, which typically means with the shortest access time. The two link controllers 32 and 34, moreover, typically are operated independently of one another.
The units 12 through 28 and 32 and 34 of the processor module of FIG. 1 check for fault conditions during each information transfer. In the event a fault is detected, the unit in question is immediately disabled from driving information onto the bus structure 30. This protects the computer system from the transfer of potentially faulty information between any units. The partner of the faulted unit, however, continues operating. The system can thus detect a fault condition and continue operating without any interruption being apparent to the user. The processor module 10 provides this fault-tolerant operation by means of the system structure, i.e. hardware, rather than with an operating system or other software program.
The peripheral control units 20, 22, 24, 26, 28, 32, 34 in the illustrated computer system transfer information to other units with an operating sequence that checks for a fault prior to driving the information onto the bus structure 30. In the event of a fault, the fault unit is inhibited from executing the information drive step, and remains off line. Operation continues, however, with the partner unit alone driving the information onto the bus structure.
It is more timewise efficient, however, for information transfers from the central processing units and from the memory units to proceed without any delay for fault checking. Accordingly, the illustrated central processing units 12 and 14 and illustrated memory units 16 and 18 operate with a sequence in which information is driven onto the bus structure without delay for fault checking. The fault check instead is performed concurrently. In the event of an error-producing fault, during the next clock phase the unit in question drives onto the bus structure a signal instructing all units of the module to disregard the item of information which was placed on the bus structure during the preceding clock phase. The module then repeats the information driving clock phase using only the good partner unit, i.e. the one free of detected faults. The repeat operation aborts the subsequent transfer cycle which would otherwise have driven data onto the bus structure during this subsequent clock phase; that subsequent cycle must be repeated in its entirety.
The processor module 10 of FIG. 1 thus operates in a manner in which a data transfer from any peripheral control units is delayed for one clock phase to provide for a fault-checking step, whereas transfers from the CPU or memory proceed without such delay and are cancelled in the event of a fault detection. In either of the foregoing instances, after completion of an information transfer during which a fault condition was detected, the potentially faulty unit remains isolated from driving information onto the A bus or the B bus, and the partner of the faulty unit continues operating.
Module Organization
FIG. 1 also shows that the central processing unit 12, identical to the partner unit 14, has two processor sections 12a and 12b, a MAP 12c connected with the two processing sections to provide virtual memory operation, a control section 12d and transceivers 12e that transfer signals between the processing unit and the buses 42, 44 and 46. The two processor sections 12a and 12b are provided for purposes of fault detection within the unit 12. They operate essentially identically and in total synchronism with one another. A comparator 12f compares signals output from the two processing sections and produces a fault signal if corresponding signals from the two sections differ. In response to the fault signal, the control section, among other operations, produces an error signal that the X bus 46 transmits to all units of the module 10. The control section then isolates that unit from driving further signals onto the bus structure 30.
The error signal which the failing unit sends to other units is, in the illustrated module, a pair of signals termed an A Bus Error signal and a B Bus Error signal. Any illustrated unit in the module 10 produces this pair of signals on the X bus when it detects certain error-producing faults. Any failing unit also produces an interrupt signal that causes the central processing unit of the module to interrogate the different units to locate the faulty one.
The central processing unit 12 receives power from one of two identical bulk supplies 36a and 36b in the main power supply 36. The partner CPU 14 receives main power from the other bulk supply. Hence a failure of one bulk supply disables only one of the two partner CPUs 12 and 14, and does not impair the other. The control section 12d in the unit 12 has a power stage that produces supply voltages for the CPU 12. The power stage monitors the bus supply voltage from the main system supply 36, and monitors the further voltages it produces, to produce power fault signals. As noted, the hardware of the CPU 12 responds to any fault condition which is developed within the unit to, among other operations, disable the drivers of the transceivers
12e from sending potentially erroneous information from the unit 12 to the bus structure.
With further reference to FIG. 1, the main memory unit 16, identical to the partner memory unit 18, has a random access memory (RAM) that is divided into two RAM sections 16a and 16b. A transceiver 16c is connected with the A bus 42 and the X bus 46 and an identical transceiver 16d is connected with the B bus 44 and the X bus 46. A format section 16e of multiplex, ECC and compare circuitry in the memory unit couples either the A bus or the B bus with the RAM sections 16a and 16b for each memory write operation. A read operation, however, drives data read from the RAM sections onto both buses 42 and 44.
An error checking and correcting (ECC) portion of the memory unit section 16e provides an error checking code on every word written into the RAM sections 16a and 16b and checks the code during each memory read operation. Depending on the syndrome of the error detected in the ECC portion of the section 16e, the memory unit raises a fault signal that is sent to all units of the module 10. More particularly, the faulty memory unit asserts both Bus Error signals. Depending on status set in that memory unit, it either corrects the data and re-transmits it on the A and B buses, or goes off-line. The partner memory unit, if present, responds to the Bus Error signals and re-transmits the correct data.
In addition to testing for faults within the unit, the memory unit 16 provides fault detection for the A and B buses of the module 10. For this purpose, the compare portion of the format section 16e compares all signals which the memory unit 16
receives from the A bus 42 with those the unit receives from the B bus 44. When the module 10, and particularly the buses 42 and 44, are operating without fault, the A bus and the B bus carry identical and synchronized signals. If the signals differ, the compare portion of the section 16e can note the fault. The format section 16e also tests the code of received signals and produces an error signal identifying any bus which has a coding error. The X bus 46 communicates this Bus Error signal to all units of the module 10 to instruct that each disregard the signals on that bus.
The disc control unit 20, identical to the partner disc control unit 22, has a bus interface section 20a, two identical disc control sections 20b and 20c, and a disc interface section 20d. The bus interface section 20a, which in the illustrated system is essentially standard for all control units, couples input signals from either the A bus 42 or the B bus 44, with a multiplexer, to the disc control sections 20b and 20c. It also applies output signals to the A bus and the B bus. However, prior to applying output signals to the buses, the bus interface section 20a compares output signals from the two control sections 20b and 20c and, in the event of an invalid comparison, disables output drivers in the interface section to prevent potentially erroneous signals from being applied to the bus structure 30. The disc control unit 20 receives operating power from one main bulk supply 36a and the partner unit 22 receives operating power from the other bulk supply 36b.
Each illustrated disc control section 20b and 20c has a programmed microprocessor which provides read and write operations and associated control operations for operating the disc memories 52. Two sections are provided to facilitate checking operations within the unit 20. The disc interface section 20d applies control and write data signals from the unit to the disc memories, and applies status and read data signals from the disc memories to the control sections. The disc interface section
20d tests various signals for error-producing faults with parity and comparison techniques.
With continued reference to FIG. 1, the communication control unit 24, like the identical partner 26, has a bus interface section 24a identical in large part at least to the interface section 20a of the disc unit 20. The communication unit 24
also has two communication sections 24b and 24c and a communication interface section 24d. There is also a lock-step circuit 24e that brings the unit 24 into exact synchronism with the partner unit 26. The bus interface section 24a functions essentially like the bus interface section 20a of the disc control unit. In the illustrated module, the communication control section 24b serves as a drive section to provide control, address, data and status functions for the communication panels 50, and the other section serves as a check section to duplicate these operations for error checking purposes. The communication interface section 24b provides error checking functions similar to those described with regard to the disc interface section 20d of the disc control unit 20.
Similarly, the link control unit 32, which is identical to the partner unit 34, has a bus interface section 32a connected with two redundant link control sections 32b and 32c and has a link interface section 32d connected between the two control section sections and the conductor set 40a of the link 40. The partner unit 34 connects with the other conductor set 40b.
The single tape control unit 28 is constructed basically like the other control units with a bus interface section 28a connected with all three buses 42, 44 and 46 of the bus structure 30, with two tape control sections 28b and 28c, and with a tape interface section 28d that connects with a tape transport 54.
Bus Structure Organization
The bus structure 30 which interconnects all units of the FIG. 1 processor module connects to the units by way of a backplane which has an array of connectors, to which the units connect, mounted on a panel to which the bus conductors are wired. The backplane is thus wired with duplicated conductors of the A bus 42 and the B bus 44 and with non-duplicated conductors of the X bus 46.
The illustrated module of FIG. 1 operates in one of three bus or backplane modes; namely, obey both the A bus and the B bus, obey the A bus, or obey the B bus. In all three modes, the A bus and the B bus are driven with identical signals in lock-step synchronization, but units actuated to receive data ignore the other bus in the Obey A mode and in the Obey B mode. In all modes, parity is continually generated, and checked, and any unit may signal that either bus is potentially faulty by producing a Bus A Error signal and/or a Bus B Error signal, depending on which bus appears to have a fault. All units in the system respond to such a single Bus Error signal and switch to obey only the other bus. The central processing unit can instruct all the units simultaneously to switch operating modes by broadcasting a mode instruction.
The module clock 38, FIG. 1, which applies main clock signals to all units by way of the X bus 46, provides main timing for the transfer of information from one unit to another. To facilitate the production of properly phased timing sequences in different units of the module, the main clock 38 produces, as FIG. 2 shows with waveforms 56a and 56b, both clock and sync timing signals. The illustrated module operates with a sixteen megahertz clock signal and an eight megahertz sync signal and is capable of initiating a new transfer cycle on every 125 nanosecond phase of the sync signal.
Each data transfer cycle has at least four such timing phases and the illustrated system is capable of pipelining four cycles on the backplane bus structure. That is, the system is capable of concurrently performing the last phase of one cycle, the third phase of a second cycle, the second phase of still another cycle, and the first phase of a fourth cycle. The phases are termed, in the sequence in which they occur in a cycle, arbitration phase, definition phase, response phase, and data transfer phase. A cycle can be extended in the case of an error to include fifth and sixth, post-data, phases. These timing phases of an operating cycle are discussed further after a description of the signals that can occur on the bus structure during each phase.
The illustrated processor module of FIG. 1 can produce the following signals on the bus structure 30 in connection with each timing phase designated. Signals which are noted as duplicated are produced on both the A bus and the B bus; other signals are produced only on the X bus.
Arbitration Phase Signals (Duplicated)
Bus Cycle request--Any unit which is ready to initiate a bus cycle can assert this signal. The unit which succeeds in gaining bus access in the arbitration phase starts a cycle during the next phase. The central processing unit has lowest priority for arbitration and frees the next timing phase following assertion of this signal to whatever peripheral control unit that secures access in the arbitration phase.
Arbitration Network--This set of signals interconnects arbitration circuits in the different units of the system for determining the unit with the highest priority which is requesting service, i.e., which is producing a Bus Cycle request. The selected unit is termed the bus master for that cycle.
Definition Phase Signals (Duplicated)
Cycle Definition--The unit designated bus master in the arbitration phase asserts this set of signals to define the cycle, e.g., read, write, I/O, interrupt acknowledge.
Address--The bus master unit asserts the physical address signals identifying the memory or I/O location for the cycle.
Address Parity--the bus master unit also produces a signal to provide even parity of the address and cycle definition signals.
Fast Busy--An addressed slave unit can assert this optional signal to which the central processing unit responds. This signal is followed by a Busy signal during the following Response phase.
Response Phase Signals
Busy--Any unit in a system can assert this signal. It aborts whatever cycle is in the response phase.
Wait--This signal is asserted to extend a cycle and has the effect of repeating the response phase of that cycle and of aborting the following cycle. It is usually asserted by the unit which the bus master unit addressed, i.e. a slave unit which is not ready to effect a data transfer.
Data Transfer Phase Signals (Duplicated)
Data--The data signals, typically sixteen in number, are asserted by the Bus Master unit during a write cycle or by a slave unit during a read cycle.
Upper Data Valid (UDV)--This signal is asserted if the upper byte of the data word is valid.
Lower Data Valid (LDV)--This signal is asserted if the lower byte of the data word is valid.
Data Parity--This signal provides even parity for the data, UDV and LDV lines of the bus structure.
Fast ECC Error--A slave unit asserts this signal during a read operation, with the data, to signal the Bus Master of a correctable memory error. It is followed by both Bus Error signals in a post-data phase. Slow master units such as a disc control unit may ignore this signal and merely respond to the ensuing Bus Error signals.
Miscellaneous Duplicated Signals
Bus PI Request--A unit requiring service asserts one of these signals at the appropriate level of interrupt priority.
Miscellaneous Non-Duplicated Signals
Bus A Error--A unit which detects an error on the A bus asserts this signal during the next timing phase.
Bus B Error--A unit which detects an error on the B bus asserts this signal during the next timing phase.
Bus Clock and Bus Synchronization--The system clock 38 produces these master timing signals.
Maintenance Request--A unit requiring a low priority maintenance service asserts this signal. It is usually accompanied by turning on an indicator light on that unit.
Slot Number--These signals are not applied to the bus structure but, in effect, are produced at the backplane connectors to identify the number and the arbitration priority assigned each unit of the processor module.
Partner Communication--These signals are bused only between partner units.
Bulk Power--These are the electrical power lines (including returns) which the bus structure carries from the bulk power supplies 36a and 36b to different units of the module 10.
Cycle Phases
During an arbitration phase, any unit of the processor module 10 of FIG. 1 and which is capable of being a bus master and which is ready to initiate a bus cycle, arbitrates for use of the bus structure. The unit does this by asserting the Bus Cycle Request signal and by simultaneously checking, by way of an arbitration network described below, for units of higher priority which also are asserting a Bus Cycle Request. In the illustrated system of FIG. 1, the arbitration network operates with the unit slot number, and priority is assigned according to slot positions. The unit, or pair of partnered units, which succeeds in gaining access to the bus structure during the arbitration phase is termed the bus master and starts a transfer cycle during the next clock phase.
The central processing unit 12, 14 in the illustrated system has the lowest priority and does not connect to the arbitration lines of the bus structure. The CPU accordingly does not start a cycle following an arbitration phase, i.e., a timing phase in which a Bus Cycle Request has been asserted. It instead releases the bus structure to the bus master, i.e. to the successful peripheral unit. Further, in the illustrated system, each memory unit 16, 18 is never a master and does not arbitrate.
During the definition phase of a cycle, the unit which is determined to be the bus master for the cycle defines the type of cycle by producing a set of cycle definition or function signals. The bus master also asserts the address signals and places on the address parity line even parity for the address and function signals. All units of the processor module, regardless of their internal operating state, always receive the signals on the bus conductors which carry the function and address signals, although peripheral control units can operate without receiving parity signals. The cycle being defined is aborted if the Bus Wait signal is asserted at this time.
During the response phase, any addressed unit of the system which is busy may assert the Busy signal to abort the cycle. A memory unit, for example, can assert a Bus Busy signal if addressed when busy or during a refresh cycle. A Bus Error signal asserted during the response phase will abort the cycle, as the error may have been with the address given during the definition phase of the cycle.
Further, a slow unit can assert the Bus Wait signal to extend the response phase for one or more extra timing intervals. The Bus Wait aborts any cycle which is in the definition phase.
Data is transferred on both the A bus and the B bus during the data transfer phase for both read and write cycles. This enables the system to pipeline a mixture of read cycles and write cycles on the bus structure without recourse to re-arbitration for use of the data lines and without having to tag data as to the source unit or the destination unit.
Full word transfers are accompanied by assertion of both UDV and LDV (upper and lower data valid) signals. Half word or byte transfers are defined as transfers accompanied by assertion of only one of these valid signals. Write transfers can be aborted early in the cycle by the bus master by merely asserting neither valid signal. Slave units, which are being read, must assert the valid signals with the data. The valid signals are included in computing bus data parity.
Errors detected during the data transfer phase will cause the unit which detects the error to assert one or both of the Bus Error signals in the next timing phase, which is a first post-data phase. In the illustrated module of FIG. 1, the peripheral control units wait to see if an error occurs before using data. The central processing unit and the main memory unit of the system however, use data as soon as it is received and in the event of an error, in effect, back up and wait for correct data. The assertion of a Bus Error signal during a post-data phase causes the transfer phase to be repeated during the next, sixth, phase of the transfer cycle. This aborts the cycle, if any, that would otherwise have transmitted data on the bus structure during this second post-data, i.e. sixth, phase.
The normal backplane mode of operation of the illustrated system is when all units are in the Obey Both mode, in which both the A bus and the B bus appear to be free of error. In response to an error on the A bus, for example, all units synchronously switch to the Obey B mode. The illustrated processor module 10 returns to the Obey Both mode of operation by means of supervisor software running in the central processing unit.
In both the Obey B and the Obey A modes of operation, both the A bus and the B bus are driven by the system units and all units still perform full error checking. The only difference from operation in the Obey Both mode is that the units merely log further errors on the one bus that is not being obeyed, without requiring data to be repeated and without aborting any cycles. A Bus Error signal however on the obeyed bus is handled as above and causes all units to switch to obey the other bus.
As stated, the FIG. 1 power supply 36 provides electrical operating power to all units of the system from the two bulk supplies 36a and 36b. In the illustrated system, one bulk supply provides operating power only to all even slot positions and the other provides power only to all odd slot positions. Thus in a fully redundant system according to the invention, a failure of one bulk supply 36a, 36b only stops operation of half the system; the other half remains operative.
Pipelined Phases
FIG. 2 illustrates the foregoing operation with four pipelined multiple-phase transfer cycles on the bus structure for the FIG. 1 module 10. Waveforms 56a and 56b show the master clock and master synchronization signals which the FIG. 1 clock 38
applies to the X bus 46, for twenty-one successive timing phases numbered (1) to (21) as labeled at the top of the drawing. The arbitration signals on the bus structure, represented with waveforms 58a, change at the start of each timing phase to initiate, in each of the twenty-one illustrated phases, arbitration for a new cycle as noted with the cycle-numbering legend #1, #2, #3 . . . #21. FIG. 2 also represents the cycle definition signals with waveform 58b. The cycle definition signals for each cycle occur one clock phase later than the arbitration signals for that cycle, as noted with the cycle numbers on the waveform 58b. The drawing further represents the Busy, Wait, Data, A Bus Error, and B Bus Error signals. The bottom row of the drawing indicates the backplane mode in which the system is operating and shows transitions between different modes.
With further reference to FIG. 2, during timing phase number (1), the module 10 produces the cycle arbitration signals for cycle #1. The system is operating in the Obey Both mode as designated. The Bus Master unit determined during the cycle arbitration of phase (1) defines the cycle to be performed during timing phase (2), as designated with the legend #1 on the cycle definition signal waveform 58b. Also in timing phase (2), the arbitration for a second cycle, cycle #2, is performed.
During timing phase (3) there is no response signal on the bus structure for cycle #1, which indicates that this cycle is ready to proceed with a data transfer as occurs during timing phase (4) and as designated with the #1 legend on the data waveform 58e. Also during timing phase (3), the cycle definition for cycle #2 is performed and arbitration for a further cycle #3 is performed.
In timing phase (4), the data for cycle #1 is transferred, and the definition for cycle #3 is performed. Also, a Bus A Error is asserted during this timing phase as designated with waveform 58f. The error signal aborts cycle #2 and switches all units in the module to the Obey B mode.
The Bus A Error signal of timing phase (4) indicates that in the prior timing phase (3) at least one unit of the system detected an error regarding signals from the A bus 42. The error occurred when no data was on the bus structure, as indicated by the absence of data in waveform 58e during timing phase (3), and there hence is no need to repeat a data transfer.
During timing phase (5), with the system operating in the Obey B mode, a fifth cycle is arbitrated, the function for cycle #4 is defined and no response signal is present on the bus structure for cycle #3. Accordingly that cycle proceeds to transfer data during time phase (6), as FIG. 2 designates. Also in time phase (6), a Bus Wait is asserted, as appears in waveform 58d; this is in connection with cycle #4. The effect is to extend that cycle for another timing phase and to abort cycle #5.
A new cycle #7 is arbitrated in timing phase (7) and the definition operation proceeds for cycle #6. In time phase (8), the data for cycle #4 is applied to the bus structure for transfer.
Also in time phase (8), a Busy signal is asserted. This signal is part of the response for cycle #6 and aborts that cycle.
The arbitration and definition operations in time phase (9) follow the same pattern but another Bus A Error is asserted. The system already is operating in the Obey B mode and accordingly the response to this signal is simply to log the error.
The Bus Wait signal asserted in time phase (10) and continuing to time phase (11) extends cycle #8 for two further time phases, so that the data for that cycle is transferred during time phase (13), as designated. The Bus Wait signal asserted during these phases also aborts cycles #9 and #10, as shown. Any Busy signal asserted during phase (10), (11) or (12) in view of the extention of cycle #8 by the Wait signal, would abort cycle #8. Note that the data transfer for cycle #7 occurs in time phase (10) independent of the signals on the Wait and the Busy conductors during this time phase.
Further Bus A Error signals occurring during time phases (11), (12) and (14) again have no effect on the system other than to be logged, because the system is already operating in the Obey B mode.
The Wait signal asserted during time phase (14) aborts cycle #13. Also, it extends cycle #12, which however is aborted by the Busy signal asserted during time phase (14). This, however, is not a common sequence.
Data for cycle #11 is transferred in the normal sequence during time phase (14). Further, the data transfer for cycle #14 occurs in time phase (17).
In time phase (19), immediately following the cycle #15 data transfer of time phase (18), a Bus B Error is asserted. This error signal aborts cycle #17, which is in the response phase, and initiates a repeat of the data transfer for cycle #15. The repeat transfer occurs during cycle #20. Further, this error signal switches the module to the Obey A mode.
Control logic in each unit of the FIG. 1 processor module 10 provides the operations in that unit for executing the foregoing bus protocol which FIG. 2 illustrates. The protocol which control logic in each peripheral control unit thus provides includes conditioning the unit, when first turned on, to receive signals on both the A bus 42 and the B bus 44 and to process the two sets of signals as if they are identical. Each illustrated central processor unit and memory unit, which process signals received from a single one of the duplicated buses, initially receives signals on the A bus 42, but operates as if the signals on the B bus 44 are identical. Further, the control logic in all units initially conditions the unit to transmit signals identically on both the A and the B buses, in lock-step synchronism.
The control logic in each illustrated peripheral control unit responds to the A bus error signal and to the B bus error, transmitted on the X bus 46, to condition the unit for the following operation. A Bus Error signal for the A (or B) bus causes the unit, and hence all units in a processor module, to stop receiving on both buses and to receive only on the other bus, i.e. the B (or A) bus, commencing with the first timing interval following the one in which the Bus Error signal first appears on the X bus. The units continue however to transmit signals on both the A and the B buses.
After a peripheral control unit has responded to an A (or B) Bus Error signal by switching to receiving on only the B (or A) bus, the control logic therein does not again switch in response to further Bus Error signals for the A (or B) bus; it essentially ignores the further error signals. However, the control logic responds to a B (or A) Bus Error signal by switching the unit to receive on the A (or B) bus, and it then ignores further B (or A) Bus Error signals.
In the illustrated module, faulty information is transmitted on the A and/or B buses generally only by the central processing unit and by the memory unit. This is because the illustrated peripheral control units check for faults prior to transmitting information on the A and B buses. If a fault is detected, the control unit in question does not transmit information, and only the partner unit does.
Further, each unit applies address and data signals on the A and B buses with parity which that unit generates. The memory unit serves, in the illustrated embodiment, to check bus parity and to drive the appropriate bus error line of the X bus
46 during the timing interval immediately following the interval in which it detected the bus parity error. The memory unit also sets a diagnostic flag and requests a diagnostic interrupt.
All units of a module which arbitrate for access to the bus structure, as discussed further in the next section, include logic that checks for false operation of the bus arbitration logic and that drives the appropriate bus error line--in the event of such a fault--on the interval following the detection of the fault. This is described further with reference to FIG. 12B. The unit also sets a diagnostic flag and requests a diagnostic interrupt.
The bus protocol which control logic in each unit provides further conditions that unit to provide the following operation in response to a Bus Error signal for the bus which the unit is presently conditioned to receive. (These operations do not occur for a Bus Error signal for a bus which is not being received; as noted the unit essentially ignores such an error signal.) A unit which was transmitting cycle definition signals during the interval immediately preceding the one in which the Bus Error signal appears on the X bus re-initiates that cycle, including arbitration for the bus, if that cycle continues to be needed. This is because the Error signal causes any unit receiving the cycle definition signals to abort that cycle.
A unit which was transmitting data signals during the timing interval immediately preceding the one in which the Bus Error signal appears on the bus repeats the data transmission two intervals after it was previously sent, i.e. on the interval following the one in which the Error Signal appears on the bus.
A unit receiving definition signals for a cycle and which is identified (addressed) by such signals responds to the Bus Error during the next interval by aborting that cycle.
A unit which was receiving data signals during the interval immediately preceding the one in which the Bus Error signal appears on the bus ignores that data and receives a re-transmission of that data two intervals after the ignored one. An alternative is for the unit to receive and latch the data from both buses and uses only the data from the good bus.
When a unit simultaneously receives Bus Error signals for both the A and the B buses, which indicates a memory ECC error, the unit responds exactly as it does to a Bus Error signal for a single bus being received, as discussed above, except that it does not make any change in the bus(es) to which it is responding. Thus an ECC error aborts any cycle that was placing cycle definition signals on the bus in the preceding interval, and it causes any data transfer in that preceding interval to be repeated in the next interval following the ECC error.
As FIG. 2 illustrates, a Wait signal aborts any cycle placing definition signals on the bus in the same interval when the Wait signal occurs, and it delays, until the second interval after the Wait terminates, the data transfer for a cycle that placed definition signals on the bus in the interval preceding initiation of the Wait. The occurrence of a Busy signal aborts a cycle that was placing definition signals on the bus in the preceding interval.
Control logic for implementing the foregoing bus protocol and related operations in the several units of a processor module for practice of this invention can be provided using conventional skills, and is not described further, other than as noted.
Arbitration Network
With reference to FIG. 3, the illustrated processor module 10 of FIG. 1 has two arbitration networks, a network 252 connected with the set of arbitration conductors 254 of the A bus 42 and another network (not shown) connected with the arbitration conductors of the B bus 44. The two networks are identical. Each arbitration network includes an arbitration circuit in each unit that competes to initiate a cycle on the bus structure. Thus each such unit has two arbitration circuits, one of which connects to the A bus 42 and the other to the B bus 44. Each arbitration network, which thus includes conductors of one bus 42 or 44 and arbitration circuits, provides an automatic hardware determination of which unit, or pair of partner units, that requests access to the bus structure has priority to initiate an operating cycle. That is, the arbitration network receives a Cycle Request signal from a unit when the operation of that unit requires a data transfer with another unit of the system, and the arbitration network determines, in each timing phase, which requesting unit has highest priority.
Each unit that arbitrates for access to the bus structure is assigned a relative priority according to the slot number at which that unit connects to the bus structure. In the illustrated system slot number zero has the lowest priority, and partner units are assigned successive slot numbers, an even number and the next odd number.
FIG. 3 illustrates the arbitration network 252 of the A bus with the connection of a set of four arbitration conductors 254a, 254b, 254c and 254d of that bus to sixteen electrical receptacles 256a, 256b, . . . 256p on the system backplane. Each receptacle 256 is assigned a slot number, the illustrated receptacles being numbered accordingly from zero to fifteen. Each receptacle 256 is illustrated simply as a vertical column of connections to the four arbitration conductors 254 and to a cycle request conductor 258. This network thus has four arbitration conductors and can handle up to (2).sup.4 or sixteen units, each connected to a separate receptacle 256. A network with five arbitration conductors, for example, can handle up to thirty-two access-requesting units.
The cycle request conductor 258 extends continuously along the A bus 42 to all the receptacles, as FIG. 3 shows. The arbitration conductors 254 on the other hand are segmented according to binary logic such that only one, the conductor 254d which is assigned the binary value (2).sup.3 extends continuously to all sixteen connectors. This conductor carries a signal designated Inhibit (8) (Inh 8). The remaining conductors 254c, 254b, and 254a are designated as carrying respectively an Inhibit (4) signal, an Inhibit (2) signal and an Inhibit (1) signal. The arbitration conductor 254c is segmented so that each segment connects to eight successive priority-ordered receptacles 256. Thus, this conductor 254c has a first segment which connects together the receptacles assigned to slot numbers (0) through (7) and has a second segment which connects together the receptacles in slot numbers (8) through (15). Similarly, the Inhibit (2) conductor 254b is segmented to connect together every four successive priority-ordered receptacles, and the conductor 254a is segmented to connect together only every two successive ordered receptacles. In each instance there is no connection along a given arbitration conductor between the different segments thereof or between different ones of those conductors.
A bus terminator 260 on the backplane connects the INH 8 arbitration conductor 254d and the cycle request conductor 258 to a positive supply voltage through separate pull-up resistors 262, 262. Further pull-up resistors 262 are connected to from each segment of the arbitration conductors 254a, 254b and 254c to the pull-up supply voltage. These connections thus tend to maintain each conductor 254 segment and conductor 258 at a selected positive voltage, i.e., in a pull-up condition. A grounded or other low voltage external signal is required to pull the voltage of any given conductor or conductor segment down from that normal positive condition.
FIG. 3 further shows an arbitration circuit 264g for one typical unit in a processor module according to the invention. The illustrated arbitration circuit is for the unit connected to the bus receptacle 256g at slot number (6). An identical circuit 264 can be connected to each other receptacle 256a, 256b . . . etc., up to the number of arbitrating units in the module. Central processing units and memory units do not connect to the arbitration network, but the illustrated processing units respond to slot numbers zero and one. Hence for the processor 10 of FIG. 1, by way of illustrative example, the link units 32 and 34 have the next lowest arbitration priority and the circuits 264 therein are connected to receptacles 256c and 256d. No unit is connected to receptacle 256e and the tape unit 28 is connected to receptacle 256f. The circuits 264 in the communication units 24 and 26 and in the disc units 20 and 22 are connected to receptacles 256g/h/i and j, respectively.
The illustrated arbitration circuit 264g includes a separate pull-up resistor 262 connected to th pull-up supply voltage from the connections therein to segments of conductors 254c, 254b, and 254a. The circuit 264g further has a flip-flop 266
that is switched to the set state in response to a Request signal produced within the unit. The set output from the flip-flop 266 is applied to one input of each of four NAND gates 268a, 268b, 268c and 268d and to both inputs of a further NAND gate 269. The illustrated arbitration circuit also has a set of four selective connections 270a, 270b, 270c and 270d, each of which applies either a ground level or an assertive positive voltage to one NAND gates 268a, 268b, 268c and 268d, respectively. The set of connections 270 is associated with one specific backplane slot and is set according to that slot number and hence to specify the arbitration priority of the unit plugged in or otherwise connected to that slot. Accordingly, the connections of the illustrated circuit 264g for slot number (6) are set as illustrated to apply the binary equivalent of this slot number, i.e., 0110, to the four NAND gates. One preferred arrangement to produce the multiple digit parallel signal identifying each slot number is to provide a binary-coded set of connections 270 on the backplane at each connection to it.
The output signals from the NAND gates 268 are connected to the arbitration conductors and to OR gates 272, the outputs of which are applied to an AND gate 274. More particularly, the output from the NAND circuit 268a associated with the binary value (2).sup.0, and connected with the connection 270a, is connected to the Inhibit (1) bus conductor 254a and to an input of OR gate 272a. Similarly, the outputs from the next three higher binary-valued NAND gates 268b, 268c and 268d are connected respectively to the Inhibit (2), Inhibit (4), and Inhibit (8) bus conductors, and to one input of the OR gates 272b, 272c and 272d respectively, as shown. The output from the request NAND gate 269 is connected to the cycle request conductor 258.
The arbitration circuit 264g of FIG. 3 produces an assertive output signal, termed Grant A, from the output AND gate 274 when it receives a Request signal at the flip-flop 266 in a time phase when no arbitration circuit connected to higher priority backplane connectors 256 receives a like request signal. More particularly, when the unit in which the illustrated arbitration circuit 264g is connected applies a request signal to the flip-flop 266, the resultant assertive signal from the set output terminal actuates the four NAND gates 268a, 268b, 268c and 268d to apply to the arbitration conductors 254a, 254b, 254c and 254d a set of signals corresponding to the backpanel slot number as produced by the connection 270. The flip-flop 266 also actuates the NAND gate 269 to apply an assertive signal to the Cycle Request conductor 258. That is, when the output of the flip-flop 266 is at a high assertive value, it applies a high input signal to the NAND gate 268a, which also receives a low input signal from the slot-number connection 270a. The gate 268a accordingly produces a high level output signal which does not pull down the normal plus V level of the Inhibit (1) conductor 254a. Each NAND gate 268b and 268c, on the other hand, receives both a high level input signal from the flip-flop 266 and from the connection 270b, 270c to which it is connected and accordingly applies a low level signal to the Inhibit (2) and Inhibit (4) conductors, respectively. The NAND gate 268d produces a high level output to the Inhibit (8) conductor, which remains at the normal pull-up value. The cycle request conductor 258 is pulled down from that level by a low level output from the NAND gate 269.
Each OR gate 272 receives as input signals one digit of the slot-number signal and the potential on the corresponding arbitration conductor at that slot. By virtue of the connections of the NAND gate 268 outputs to the segmented arbitration conductors 254, a request signal applied to a higher priority arbitration circuit 264 alters the signals which the OR gates 272 in circuit 264g otherwise receive from within that circuit 264g. A request signal applied to a lower priority arbitration circuit 264, on the other hand, does not alter the states of the signals applied to the OR gates 272 in the arbitration circuit 264g.
In particular, in the absence of any other arbitration circuits receiving an assertive request signal, the OR gate 272a in the arbitration circuit 264g receives a high level signal from the NAND gate 268a and receives a low level signal from the connection 270a; it accordingly produces a high level output signal. The same input signals are applied to the OR gate 272d and it also produces a high level output signal. The OR gate 272b on the other hand receives a low level signal from the NAND gate 268b and receives a high level signal from the connection 270b. Hence the OR gate 272b receives two different valued input signals and produces a high level output signal. The input conditions to the OR gate 272c also differ in this same manner. Thus, under this operating condition, all four OR gates 272b produce identical, high level, output signals. In response, the AND gate 274 produces an assertive Grant A output signal on line 278. This signal causes the associated unit of the processor module to initiate a cycle of operations, as discussed above with reference to FIG. 2.
In the event that an arbitration circuit 264 in a lower priority unit is also activated by a request signal, the input signals to the OR gates 272 of the illustrated arbitration circuit 264g are unchanged from the example just described. However, in the event a higher priority unit produces a request signal, the inputs to the OR gates of the illustrated arbitration circuit 264g are different, and the output AND gate 274 does not produce an assertive signal. For example, when the system unit connected to the next higher priority receptacle 276h produces a request signal, the arbitration circuit therein applies a low level signal not only to the Inhibit (4) and Inhibit (2) conductors, but also to the Inhibit (1) conductor. The resultant low level signal on the latter conductor is applied to the OR gate 272 in the circuit 264g connected to the number (6) slot. That OR gate accordingly produces a low level output signal, thereby inhibiting the AND gate 274 at slot (6) from producing an assertive output signal.
Note that the foregoing operation employs NAND gates 268 that produce a high level output signal with a relatively high impedance. A NAND gate with an open-collector circuit, for example, provides this operation, which facilitates pulling the voltage on an arbitration conductor segment to a low level.
The arbitration circuit 264g in FIG. 3 further has an OR gate 280 connected between the switch 270a and input to the OR gate 272a. The other input to the OR gate 280 is an assertive level that comes from a hardware status flag that is set to allow an even-odd pair of backplane slots, to which are connected two units operating as partners, to arbitrate as a single unit. The OR gate 280 thus is optional and is used only where a unit of the system 10 may operate in lock-step synchronism with a partner unit.
It will now be understood that each unit of a processing module which competes, via the arbitration network, to define a bus cycle has two arbitration circuits 264. One is connected to the A bus as FIG. 3 shows, and the other is connected in the identical manner t